image

BoAt Product Review Analysis Project

Data source downloaded from Kaggle.com
Dataset Link :- Boat Product Review Dataset

In [1]:
# Libraties for data manipulation and cleaning
import pandas as pd 
import numpy as np


# Libraries for visualization
import matplotlib.pyplot as plt
%matplotlib inline 
import seaborn as sns
import plotly.express as pe
In [2]:
# Loading csv files into designated dataframe

bluetooth_speakers  = pd.read_csv('E:/Data Analyst Project By self/#3 BoAts Product Review Analysis/#1 Original Dataset/bluetooth speakers.csv')
boat_product = pd.read_csv('E:/Data Analyst Project By self/#3 BoAts Product Review Analysis/#1 Original Dataset/BoatProduct.csv')
limited_edition = pd.read_csv('E:/Data Analyst Project By self/#3 BoAts Product Review Analysis/#1 Original Dataset/Limited Edition.csv')
misfit = pd.read_csv('E:/Data Analyst Project By self/#3 BoAts Product Review Analysis/#1 Original Dataset/Misfit.csv')
mobile_accessories = pd.read_csv('E:/Data Analyst Project By self/#3 BoAts Product Review Analysis/#1 Original Dataset/Mobile Accessories.csv')
smart_watches = pd.read_csv('E:/Data Analyst Project By self/#3 BoAts Product Review Analysis/#1 Original Dataset/smart watches.csv')
trebel = pd.read_csv('E:/Data Analyst Project By self/#3 BoAts Product Review Analysis/#1 Original Dataset/TRebel.csv')
wired_headphones = pd.read_csv('E:/Data Analyst Project By self/#3 BoAts Product Review Analysis/#1 Original Dataset/wired headphones.csv')
wireless_earbuds = pd.read_csv('E:/Data Analyst Project By self/#3 BoAts Product Review Analysis/#1 Original Dataset/wireless earbuds.csv')
wireless_headphones = pd.read_csv('E:/Data Analyst Project By self/#3 BoAts Product Review Analysis/#1 Original Dataset/wireless headphones.csv')

Data Cleaning of Bluetooth Speakers csv file¶

In [3]:
bluetooth_speakers
Out[3]:
ProductName ProductPrice Discount NumberofReviews Rate Review Summary
0 Stone 1000v2 \nSale price₹ 3,999 43% off 7 reviews ★\n 5.0\n Fantastic product Very good product in this price range..
1 Stone 1000v2 Sale price₹ 3,999 43% off 7 reviews ★\n 5.0\n Outstanding Outstanding Product
2 Stone 1000v2 \nSale price₹ 3,999 43% off 7 reviews ★\n 5.0\n Awesome Works like a charm. Good sound quality. Visual...
3 Stone 1000v2 \nSale price₹ 3,999 43% off 7 reviews ★\n 5.0\n Boat stone 1000 v2 Great
4 Stone 1000v2 \nSale price₹ 3,999 43% off 7 reviews ★\n 5.0\n Quality sound Amazing sound quality.totally premium.&.afford...
... ... ... ... ... ... ... ...
109 Stone 500 \nSale price₹ 1,999 33% off 8 reviews ★\n 4.9\n Boat stone 500🔥 Boat stone 500 is the best speaker should buy ...
110 Stone 500 \nSale price₹ 1,999 33% off 8 reviews ★\n 4.9\n Awesome Great delivery, product is great too with the ...
111 Stone 500 \nSale price₹ 1,999 33% off 8 reviews ★\n 4.9\n NaN boAt Stone 500
112 Stone 500 \nSale price₹ 1,999 33% off 8 reviews ★\n 4.9\n NaN A good quality product
113 Stone 500 \nSale price₹ 1,999 33% off 8 reviews ★\n 4.9\n NaN boAt Stone 500

114 rows × 7 columns

In [4]:
# Stage 1 : Dropping duplicates on basis of ProductName
bluetooth_speakers.drop_duplicates(subset = 'ProductName',inplace = True)
bluetooth_speakers['ProductName'].iloc[1]

# Stage 2 : Triming spaces in ProductName using strip
bluetooth_speakers['ProductName'] = bluetooth_speakers['ProductName'].str.strip()
bluetooth_speakers['ProductName'].iloc[1]

# Stage 3 : Cleaning ProductPrice column and making data type integer
product_price = bluetooth_speakers['ProductPrice'].str.split(' ',n=2,expand = True)
bluetooth_speakers['ProductPrice'] = product_price[2]
bluetooth_speakers['ProductPrice'] = bluetooth_speakers['ProductPrice'].str.replace(',','')
bluetooth_speakers
bluetooth_speakers['ProductPrice'] = bluetooth_speakers['ProductPrice'].astype(int)

# Stage 4 : Cleaning NumberofReviews and making data type as int
reviews = bluetooth_speakers['NumberofReviews'].str.split(' ',n=2,expand = True)
reviews = reviews[0]
bluetooth_speakers['NumberofReviews'] = reviews
bluetooth_speakers.rename(columns = {'NumberofReviews':'Reviews'},inplace = True)
bluetooth_speakers['Reviews'] = bluetooth_speakers['Reviews'].astype(int)

# Stage 5 : Cleaning Discount Column
discount= bluetooth_speakers['Discount'].str.split(' ',n=2,expand = True)
discount = discount[0]
bluetooth_speakers['Discount'] = discount

# Stage 6 : Cleaning Rate and making data type as float
rate = bluetooth_speakers['Rate'].str.split(' ', n=1 , expand = True)
rate = rate[1]
bluetooth_speakers['Rate'] = rate
bluetooth_speakers['Rate'] = bluetooth_speakers['Rate'].str.replace('\n','')
bluetooth_speakers['Rate'] = bluetooth_speakers['Rate'].str.strip()
bluetooth_speakers['Rate'] = bluetooth_speakers['Rate'].astype(float)

# Stage 7 : dropping Review and Summary columns
bluetooth_speakers.drop(columns = ['Review','Summary'],inplace = True)
bluetooth_speakers = bluetooth_speakers.set_index('ProductName').reset_index()
In [5]:
bluetooth_speakers
Out[5]:
ProductName ProductPrice Discount Reviews Rate
0 Stone 1000v2 3999 43% 7 5.0
1 Stone Grenade 1499 62% 92 4.9
2 Stone 190 1299 57% 108 4.8
3 Stone 352 1699 51% 26 4.8
4 Stone 650 1999 60% 107 4.9
5 Stone 1200F 3999 43% 38 5.0
6 Stone 135 899 55% 17 4.9
7 Stone 180 939 62% 20 4.8
8 Rugby Plus 1999 60% 11 4.6
9 Stone 500 1999 33% 8 4.9
In [6]:
bluetooth_speakers.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   ProductName   10 non-null     object 
 1   ProductPrice  10 non-null     int32  
 2   Discount      10 non-null     object 
 3   Reviews       10 non-null     int32  
 4   Rate          10 non-null     float64
dtypes: float64(1), int32(2), object(2)
memory usage: 448.0+ bytes
In [7]:
bluetooth_speakers.describe()
Out[7]:
ProductPrice Reviews Rate
count 10.000000 10.000000 10.000000
mean 2033.000000 43.400000 4.860000
std 1113.155674 41.878661 0.117379
min 899.000000 7.000000 4.600000
25% 1349.000000 12.500000 4.800000
50% 1849.000000 23.000000 4.900000
75% 1999.000000 78.500000 4.900000
max 3999.000000 108.000000 5.000000
In [ ]:
 

Data Cleaning of Limited Edition csv file¶

In [8]:
limited_edition
Out[8]:
ProductName ProductPrice Discount NumberofReviews Rate Review Summary
0 Rockerz 558 Sunburn Edition \nSale price₹ 1,999 60% off 1 review ★\n 4.0\n Good sound. Worth its price Product is comfy, looks premium, feels great (...
1 Airdopes 383 Sunburn Edition \nSale price₹ 2,499 50% off 9 reviews ★\n 5.0\n Excellent battery life Can listen for almost more than 6 hours in si...
2 Airdopes 383 Sunburn Edition \nSale price₹ 2,499 50% off 9 reviews ★\n 5.0\n NaN boAt Airdopes 383 Sunburn Edition
3 Airdopes 383 Sunburn Edition \nSale price₹ 2,499 50% off 9 reviews ★\n 5.0\n NaN boAt Airdopes 383 Sunburn Edition
4 Airdopes 383 Sunburn Edition \nSale price₹ 2,499 50% off 9 reviews ★\n 5.0\n NaN Thanks boat
... ... ... ... ... ... ... ...
135 Rockerz 450 DC edition \nSale price₹ 1,299 67% off 4 reviews ★\n 5.0\n Headphone Veerrrrrrrrry gooooooooddddddd
136 Rockerz 450 DC edition \nSale price₹ 1,299 67% off 4 reviews ★\n 5.0\n NaN Rockerz 450 DC edition
137 Rockerz 450 DC edition \nSale price₹ 1,299 67% off 4 reviews ★\n 5.0\n NaN Rockerz 450 DC edition
138 Rockerz 450 DC edition \nSale price₹ 1,299 67% off 4 reviews ★\n 5.0\n Loved it Really really loved it ❣️
139 Rockerz 450 \nSale price₹ 1,299 67% off 1 review ★\n 5.0\n Good Good

140 rows × 7 columns

In [9]:
# Stage 1 : Dropping duplicates on basis of ProductName
limited_edition.drop_duplicates(subset = 'ProductName',inplace = True)
limited_edition['ProductName'].iloc[1]

# Stage 2 : Triming spaces in ProductName using strip
limited_edition['ProductName'] = limited_edition['ProductName'].str.strip()
limited_edition['ProductName'].iloc[1]

# Stage 3 : Cleaning ProductPrice column and making data type integer
product_price = limited_edition['ProductPrice'].str.split(' ',n=2,expand = True)
limited_edition['ProductPrice'] = product_price[2]
limited_edition['ProductPrice'] = limited_edition['ProductPrice'].str.replace(',','')
limited_edition
limited_edition['ProductPrice'] = limited_edition['ProductPrice'].astype(int)

# Stage 4 : Cleaning NumberofReviews and making data type as int
reviews = limited_edition['NumberofReviews'].str.split(' ',n=2,expand = True)
reviews = reviews[0]
limited_edition['NumberofReviews'] = reviews
limited_edition.rename(columns = {'NumberofReviews':'Reviews'},inplace = True)
limited_edition['Reviews'] = limited_edition['Reviews'].astype(int)

# Stage 5 : Cleaning Discount Column
discount= limited_edition['Discount'].str.split(' ',n=2,expand = True)
discount = discount[0]
limited_edition['Discount'] = discount

# Stage 6 : Cleaning Rate and making data type as float
rate = limited_edition['Rate'].str.split(' ', n=1 , expand = True)
rate = rate[1]
limited_edition['Rate'] = rate
limited_edition['Rate'] = limited_edition['Rate'].str.replace('\n','')
limited_edition['Rate'] = limited_edition['Rate'].str.strip()
limited_edition['Rate'] = limited_edition['Rate'].astype(float)

# Stage 7 : dropping Review and Summary columns
limited_edition.drop(columns = ['Review','Summary'],inplace = True)
limited_edition = limited_edition.set_index('ProductName').reset_index()
In [10]:
limited_edition
Out[10]:
ProductName ProductPrice Discount Reviews Rate
0 Rockerz 558 Sunburn Edition 1999 60% 1 4.0
1 Airdopes 383 Sunburn Edition 2499 50% 9 5.0
2 Airdopes 131 Captain America Marvel Edition 1599 47% 43 4.9
3 Airdopes 131 Iron Man Marvel Edition 1599 47% 70 4.9
4 Bassheads 172 Sunburn Edition 349 73% 12 4.8
5 BassHeads 152 Sunburn Edition 399 69% 20 4.9
6 Airdopes 381 Sunburn Edition 2999 40% 1 5.0
7 Watch Storm - Captain America Marvel Edition 1999 67% 1 5.0
8 Airdopes 441 KKR Edition 2499 58% 1 5.0
9 Stone 190 - Captain America Marvel Edition 1299 57% 2 5.0
10 Stone SpinX 2.0 Bira Edition 2099 70% 7 4.9
11 Stone 190 - Iron Man Marvel Edition 1299 57% 3 5.0
12 Airdopes 381 Masaba Edition 2299 54% 2 5.0
13 Stone 190 - Black Panther Marvel Edition 1299 57% 1 5.0
14 Rockerz 450 - Iron Man Marvel Edition 1799 55% 8 5.0
15 Watch Storm - Black Panther Marvel Edition 1999 67% 2 5.0
16 Rockerz 550 Sunburn Edition 1499 70% 14 4.9
17 Rockerz 600 KKR Edition 2499 50% 4 5.0
18 Airdopes 441 - Masaba Edition 1799 70% 6 4.7
19 BassHeads 152 KKR Edition 549 57% 21 4.8
20 Bassheads 172 KKR Edition 499 62% 1 5.0
21 Airdopes 131 Batman DC Edition 999 67% 13 4.9
22 Airdopes 131 999 67% 1 5.0
23 Airdopes 441 Pro Special Batman Edition 2199 69% 24 5.0
24 Rockerz 450 Batman DC Edition 1299 67% 8 4.9
25 Rockerz 450 DC edition 1299 67% 4 5.0
26 Rockerz 450 1299 67% 1 5.0
In [11]:
limited_edition.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27 entries, 0 to 26
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   ProductName   27 non-null     object 
 1   ProductPrice  27 non-null     int32  
 2   Discount      27 non-null     object 
 3   Reviews       27 non-null     int32  
 4   Rate          27 non-null     float64
dtypes: float64(1), int32(2), object(2)
memory usage: 992.0+ bytes
In [ ]:
 

Data Cleaning of Misfit csv file¶

In [12]:
misfit
Out[12]:
ProductName ProductPrice Discount NumberofReviews Rate Review Summary
0 Misfit T50 Trimmer \nSale price₹ 989 60% off 20 reviews ★\n 4.9\n NaN Thanks for the quality
1 Misfit T50 Trimmer \nSale price₹ 989 60% off 20 reviews ★\n 4.9\n Good Good product
2 Misfit T50 Trimmer \nSale price₹ 989 60% off 20 reviews ★\n 4.9\n NaN Misfit T50 Trimmer
3 Misfit T50 Trimmer \nSale price₹ 989 60% off 20 reviews ★\n 4.9\n Awesome product I didn't expected this quality at this price
4 Misfit T50 Trimmer \nSale price₹ 989 60% off 20 reviews ★\n 4.9\n Great I have used this trimmer many times after buyi...
5 Misfit T50 Trimmer \nSale price₹ 989 60% off 20 reviews ★\n 4.9\n Nice product Premium quality
6 Misfit T50 Lite \nSale price₹ 879 56% off 4 reviews ★\n 4.5\n NaN Good
7 Misfit T50 Lite \nSale price₹ 879 56% off 4 reviews ★\n 4.5\n NaN Sleek design, nice performance.\nSatisfied wit...
8 Misfit T50 Lite \nSale price₹ 879 56% off 4 reviews ★\n 4.5\n Nice Nice
9 Misfit T50 Lite \nSale price₹ 879 56% off 4 reviews ★\n 4.5\n Very nice Good delivery, nice trimmer and boat good brand
10 Misfit T30 Trimmer \nSale price₹ 769 49% off 5 reviews ★\n 4.8\n NaN VERY GOOD PERFORMANCE
11 Misfit T30 Trimmer \nSale price₹ 769 49% off 5 reviews ★\n 4.8\n Nice one Very skin friendly blades. \nNormal power and ...
12 Misfit T30 Trimmer \nSale price₹ 769 49% off 5 reviews ★\n 4.8\n Best Good grooming performance
13 Misfit T30 Trimmer \nSale price₹ 769 49% off 5 reviews ★\n 4.8\n Fantastic Very nice
14 Misfit T30 Trimmer \nSale price₹ 769 49% off 5 reviews ★\n 4.8\n NaN Nish
15 Misfit T200 3-in-1 Grooming Kit for Men \nSale price₹ 1,199 70% off 2 reviews ★\n 5.0\n NaN Best at this segment
16 Misfit T200 3-in-1 Grooming Kit for Men \nSale price₹ 1,199 70% off 2 reviews ★\n 5.0\n Outstanding performance Nice packaging and no doubt \nYou can buy it w...
17 Misfit T200 \nSale price₹ 1,699 58% off 42 reviews ★\n 4.9\n NaN Misfit T200
18 Misfit T200 \nSale price₹ 1,699 58% off 42 reviews ★\n 4.9\n Very good 😀😀😀
19 Misfit T200 \nSale price₹ 1,699 58% off 42 reviews ★\n 4.9\n very good excellent item
20 Misfit T200 \nSale price₹ 1,699 58% off 42 reviews ★\n 4.9\n Amazing worth taking Great experience
21 Misfit T200 \nSale price₹ 1,699 58% off 42 reviews ★\n 4.9\n Product Products and packaging both are really good
22 Misfit T200 \nSale price₹ 1,699 58% off 42 reviews ★\n 4.9\n Good Product Used it for the first time, seemed quite smoot...
23 Misfit T150 Trimmer \nSale price₹ 1,429 59% off 1 review ★\n 4.0\n BoAt misfit T150 trimmer Product is excellent but deliver time might be...
24 Misfit T50 Trimmer \nSale price₹ 989 60% off 20 reviews ★\n 4.9\n NaN Thanks for the quality
25 Misfit T50 Trimmer \nSale price₹ 989 60% off 20 reviews ★\n 4.9\n Good Good product
26 Misfit T50 Trimmer \nSale price₹ 989 60% off 20 reviews ★\n 4.9\n NaN Misfit T50 Trimmer
27 Misfit T50 Trimmer \nSale price₹ 989 60% off 20 reviews ★\n 4.9\n Awesome product I didn't expected this quality at this price
28 Misfit T50 Trimmer \nSale price₹ 989 60% off 20 reviews ★\n 4.9\n Great I have used this trimmer many times after buyi...
29 Misfit T50 Trimmer \nSale price₹ 989 60% off 20 reviews ★\n 4.9\n Nice product Premium quality
30 Misfit T50 Lite \nSale price₹ 879 56% off 4 reviews ★\n 4.5\n NaN Good
31 Misfit T50 Lite \nSale price₹ 879 56% off 4 reviews ★\n 4.5\n NaN Sleek design, nice performance.\nSatisfied wit...
32 Misfit T50 Lite \nSale price₹ 879 56% off 4 reviews ★\n 4.5\n Nice Nice
33 Misfit T50 Lite \nSale price₹ 879 56% off 4 reviews ★\n 4.5\n Very nice Good delivery, nice trimmer and boat good brand
34 Misfit T30 Trimmer \nSale price₹ 769 49% off 5 reviews ★\n 4.8\n NaN VERY GOOD PERFORMANCE
35 Misfit T30 Trimmer \nSale price₹ 769 49% off 5 reviews ★\n 4.8\n Nice one Very skin friendly blades. \nNormal power and ...
36 Misfit T30 Trimmer \nSale price₹ 769 49% off 5 reviews ★\n 4.8\n Best Good grooming performance
37 Misfit T30 Trimmer \nSale price₹ 769 49% off 5 reviews ★\n 4.8\n Fantastic Very nice
38 Misfit T30 Trimmer \nSale price₹ 769 49% off 5 reviews ★\n 4.8\n NaN Nish
39 Misfit T200 3-in-1 Grooming Kit for Men \nSale price₹ 1,199 70% off 2 reviews ★\n 5.0\n NaN Best at this segment
40 Misfit T200 3-in-1 Grooming Kit for Men \nSale price₹ 1,199 70% off 2 reviews ★\n 5.0\n Outstanding performance Nice packaging and no doubt \nYou can buy it w...
41 Misfit T200 \nSale price₹ 1,699 58% off 42 reviews ★\n 4.9\n NaN Misfit T200
42 Misfit T200 \nSale price₹ 1,699 58% off 42 reviews ★\n 4.9\n Very good 😀😀😀
43 Misfit T200 \nSale price₹ 1,699 58% off 42 reviews ★\n 4.9\n very good excellent item
44 Misfit T200 \nSale price₹ 1,699 58% off 42 reviews ★\n 4.9\n Amazing worth taking Great experience
45 Misfit T200 \nSale price₹ 1,699 58% off 42 reviews ★\n 4.9\n Product Products and packaging both are really good
46 Misfit T200 \nSale price₹ 1,699 58% off 42 reviews ★\n 4.9\n Good Product Used it for the first time, seemed quite smoot...
47 Misfit T150 Trimmer \nSale price₹ 1,429 59% off 1 review ★\n 4.0\n BoAt misfit T150 trimmer Product is excellent but deliver time might be...
In [13]:
# Stage 1 : Dropping duplicates on basis of ProductName
misfit.drop_duplicates(['ProductName'],inplace  = True)
misfit.set_index('ProductName').reset_index()

# Stage 2 : Triming spaces in ProductName using strip
misfit['ProductName']  = misfit['ProductName'].str.strip()
misfit['ProductName'].iloc[1]

# Stage 3 : Cleaning ProductPrice column and making data type integer
price = misfit['ProductPrice'].str.split(' ',n=2,expand = True)
price = price[2]
misfit['ProductPrice'] = price
misfit['ProductPrice'] = misfit['ProductPrice'].str.replace(',','') 
misfit['ProductPrice'] = misfit['ProductPrice'].astype(int)


# Stage 4 : Cleaning NumberofReviews and making data type as int
review = misfit['NumberofReviews'].str.split(' ',n=2,expand = True)
review = review[0]
misfit['NumberofReviews'] = review
misfit.rename(columns = {'NumberofReviews':'Reviews'},inplace = True)
misfit['Reviews'] = misfit['Reviews'].astype(int) 

# Stage 5 : Cleaning Discount Column
discount= misfit['Discount'].str.split(' ',n=2,expand = True)
discount = discount[0]
misfit['Discount'] = discount

# Stage 6 : Cleaning Rate and making data type as float
rate = misfit['Rate'].str.split(' ', n=1 , expand = True)
rate = rate[1]
misfit['Rate'] = rate
misfit['Rate'] = misfit['Rate'].str.replace('\n','')
misfit['Rate'] = misfit['Rate'].str.strip()
misfit['Rate'] = misfit['Rate'].astype(float)


# Stage 7 : dropping Review and Summary columns
misfit.drop(columns = ['Review','Summary'],inplace = True)
misfit = misfit.set_index('ProductName').reset_index()
In [14]:
misfit
Out[14]:
ProductName ProductPrice Discount Reviews Rate
0 Misfit T50 Trimmer 989 60% 20 4.9
1 Misfit T50 Lite 879 56% 4 4.5
2 Misfit T30 Trimmer 769 49% 5 4.8
3 Misfit T200 3-in-1 Grooming Kit for Men 1199 70% 2 5.0
4 Misfit T200 1699 58% 42 4.9
5 Misfit T150 Trimmer 1429 59% 1 4.0
In [15]:
misfit.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   ProductName   6 non-null      object 
 1   ProductPrice  6 non-null      int32  
 2   Discount      6 non-null      object 
 3   Reviews       6 non-null      int32  
 4   Rate          6 non-null      float64
dtypes: float64(1), int32(2), object(2)
memory usage: 320.0+ bytes
In [ ]:
 

Data Cleaning of Mobile Accessories csv file¶

In [16]:
mobile_accessories
Out[16]:
ProductName ProductPrice Discount NumberofReviews Rate Review Summary
0 Mask \nSale price₹ 499 17% off 147 reviews ★\n 4.9\n Excellent!! One of the most comfortable masks available in...
1 Mask \nSale price₹ 499 17% off 147 reviews ★\n 4.9\n Good mask Quality is good and good to use . size is for ...
2 Mask \nSale price₹ 499 17% off 147 reviews ★\n 4.9\n The best in range Super bass with very good spec.... amazing ne...
3 Mask \nSale price₹ 499 17% off 147 reviews ★\n 4.9\n Excellent Product Top notch product amongst other items in the m...
4 Mask \nSale price₹ 499 17% off 147 reviews ★\n 4.9\n Excellent product Very comfortable..
... ... ... ... ... ... ... ...
113 Dual QC-PD Port Rapid Car Charger With Type C... \nSale price₹ 599 60% off 5 reviews ★\n 5.0\n Awesome product Best car charger
114 Dual QC-PD Port Rapid Car Charger With Type C... \nSale price₹ 599 60% off 5 reviews ★\n 5.0\n Its response is very good It's data cable and fast charging is too much ...
115 Dual QC-PD Port Rapid Car Charger With Type C... \nSale price₹ 599 60% off 5 reviews ★\n 5.0\n NaN boAt Dual QC-PD Port Rapid Car Charger With Ty...
116 Dual QC-PD Port Rapid Car Charger With Type C... \nSale price₹ 599 60% off 5 reviews ★\n 5.0\n NaN boAt Dual QC-PD Port Rapid Car Charger With Ty...
117 Dual QC-PD Port Rapid Car Charger With Type C... \nSale price₹ 599 60% off 5 reviews ★\n 5.0\n Very good Product Its great product but just lower the price a bit.

118 rows × 7 columns

In [17]:
# Stage 1 : Dropping duplicates on basis of ProductName

mobile_accessories.drop_duplicates('ProductName',inplace = True)
mobile_accessories['ProductName'].iloc[1]

# Stage 2 : Triming spaces in ProductName using strip
mobile_accessories['ProductName'] = mobile_accessories['ProductName'].str.strip()
mobile_accessories['ProductName'].iloc[1]

# Stage 3 : Cleaning ProductPrice column and making data type integer
product_price = mobile_accessories['ProductPrice'].str.split(' ',n=2,expand = True)
mobile_accessories['ProductPrice'] = product_price[2]
mobile_accessories['ProductPrice'] = mobile_accessories['ProductPrice'].str.replace(',','')
mobile_accessories
mobile_accessories['ProductPrice'] = mobile_accessories['ProductPrice'].astype(int)

# Stage 4 : Cleaning NumberofReviews and making data type as int
reviews = mobile_accessories['NumberofReviews'].str.split(' ',n=2,expand = True)
reviews = reviews[0]
mobile_accessories['NumberofReviews'] = reviews
mobile_accessories.rename(columns = {'NumberofReviews':'Reviews'},inplace = True)
mobile_accessories['Reviews'] = mobile_accessories['Reviews'].astype(int)

# Stage 5 : Cleaning Discount Column
discount= mobile_accessories['Discount'].str.split(' ',n=2,expand = True)
discount = discount[0]
mobile_accessories['Discount'] = discount

# Stage 6 : Cleaning Rate and making data type as float
rate = mobile_accessories['Rate'].str.split(' ', n=1 , expand = True)
rate = rate[1]
mobile_accessories['Rate'] = rate
mobile_accessories['Rate'] = mobile_accessories['Rate'].str.replace('\n','')
mobile_accessories['Rate'] = mobile_accessories['Rate'].str.strip()
mobile_accessories['Rate'] = mobile_accessories['Rate'].astype(float)

# Stage 7 : dropping Review and Summary columns
mobile_accessories.drop(columns = ['Review','Summary'],inplace = True)
mobile_accessories = mobile_accessories.set_index('ProductName').reset_index()
In [18]:
mobile_accessories
Out[18]:
ProductName ProductPrice Discount Reviews Rate
0 Mask 499 17% 147 4.9
1 Deuce 330 299 57% 4 5.0
2 LTG 500 Indestructible Apple Certified Lightni... 899 49% 26 4.8
3 Micro USB 150 1.5 Meter 499 50% 2 5.0
4 A400 USB Type C Data Cable 349 65% 15 4.9
5 Type C C400 499 50% 6 4.8
6 Rugged V3 Micro USB 1.5 Meter 299 63% 63 4.9
7 Deuce USB 300 299 57% 76 4.8
8 Micro USB 500 Cable 1.5 Meter 279 65% 25 4.9
9 LTG 500 Indestructible Apple Certified Lightni... 799 47% 48 4.8
10 Energyshroom PB10 949 53% 1 5.0
11 Energyshroom PB9 949 53% 2 5.0
12 Dual Port Rapid Car Charger (Qualcomm Certified) 449 50% 7 5.0
13 Micro USB 100 1 Meter 199 72% 23 4.9
14 Dual QC-PD Port Rapid Car Charger With Power D... 699 53% 2 5.0
15 Dual QC-PD Port Rapid Car Charger With Type C ... 599 60% 5 5.0
In [19]:
mobile_accessories.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   ProductName   16 non-null     object 
 1   ProductPrice  16 non-null     int32  
 2   Discount      16 non-null     object 
 3   Reviews       16 non-null     int32  
 4   Rate          16 non-null     float64
dtypes: float64(1), int32(2), object(2)
memory usage: 640.0+ bytes
In [ ]:
 

Data Cleaning of Smart Watches csv file¶

In [20]:
smart_watches
Out[20]:
ProductName ProductPrice Discount NumberofReviews Rate Review Summary
0 Wave Select \nSale price₹ 2,099 70% off 13 reviews ★\n 5.0\n Very good Always satisfied with your product
1 Wave Select \nSale price₹ 2,099 70% off 13 reviews ★\n 5.0\n great product excellent
2 Wave Select \nSale price₹ 2,099 70% off 13 reviews ★\n 5.0\n Good Product I just love it
3 Wave Select \nSale price₹ 2,099 70% off 13 reviews ★\n 5.0\n NaN boAt Wave Select | Best Sports Edition Smartwa...
4 Wave Select \nSale price₹ 2,099 70% off 13 reviews ★\n 5.0\n NaN boAt Wave Select | Best Sports Edition Smartwa...
... ... ... ... ... ... ... ...
191 TRebel Watch Straps \nSale price₹ 399 56% off 1 review ★\n 5.0\n NaN nice strap
192 Wave Voice \nSale price₹ 2,699 55% off 1 review ★\n 5.0\n NaN I asked for exchange, I still didn't get response
193 Storm RTL \nSale price₹ 2,999 50% off 3 reviews ★\n 5.0\n Best watch Best watch in world
194 Storm RTL \nSale price₹ 2,999 50% off 3 reviews ★\n 5.0\n Watch Nice
195 Storm RTL \nSale price₹ 2,999 50% off 3 reviews ★\n 5.0\n Very good quality Good brand

196 rows × 7 columns

In [21]:
# Stage 1 : Dropping duplicates on basis of ProductName
smart_watches.drop_duplicates(subset = 'ProductName',inplace = True)
smart_watches['ProductName'].iloc[1]

# Stage 2 : Triming spaces in ProductName using strip
smart_watches['ProductName'] = smart_watches['ProductName'].str.strip()
smart_watches['ProductName'].iloc[1]

# Stage 3 : Cleaning ProductPrice column and making data type integer
product_price = smart_watches['ProductPrice'].str.split(' ',n=2,expand = True)
smart_watches['ProductPrice'] = product_price[2]
smart_watches['ProductPrice'] = smart_watches['ProductPrice'].str.replace(',','')
smart_watches
smart_watches['ProductPrice'] = smart_watches['ProductPrice'].astype(int)

# Stage 4 : Cleaning NumberofReviews and making data type as int
reviews = smart_watches['NumberofReviews'].str.split(' ',n=2,expand = True)
reviews = reviews[0]
smart_watches['NumberofReviews'] = reviews
smart_watches.rename(columns = {'NumberofReviews':'Reviews'},inplace = True)
smart_watches['Reviews'] = smart_watches['Reviews'].astype(int)

# Stage 5 : Cleaning Discount Column
discount= smart_watches['Discount'].str.split(' ',n=2,expand = True)
discount = discount[0]
smart_watches['Discount'] = discount

# Stage 6 : Cleaning Rate and making data type as float
rate = smart_watches['Rate'].str.split(' ', n=1 , expand = True)
rate = rate[1]
smart_watches['Rate'] = rate
smart_watches['Rate'] = smart_watches['Rate'].str.replace('\n','')
smart_watches['Rate'] = smart_watches['Rate'].str.strip()
smart_watches['Rate'] = smart_watches['Rate'].astype(float)

# Stage 7 : dropping Review and Summary columns
smart_watches.drop(columns = ['Review','Summary'],inplace = True)
smart_watches = smart_watches.set_index('ProductName').reset_index()
In [22]:
smart_watches
Out[22]:
ProductName ProductPrice Discount Reviews Rate
0 Wave Select 2099 70% 13 5.0
1 Storm Call 1799 77% 23 4.7
2 Wave Beat 1499 75% 5 4.6
3 Wave Neo 1499 75% 20 4.7
4 Wave Connect 2999 57% 88 4.9
5 Xtend‌ 2699 66% 146 4.8
6 Wave Call 1799 77% 41 4.9
7 Wave Lite 1699 76% 55 4.9
8 Watch Flash 1399 80% 48 4.8
9 Storm 1499 75% 1008 4.8
10 Wave Style 1449 76% 17 4.7
11 Xplorer‌‌ O2 3999 33% 29 4.8
12 Wave Pro 1499 79% 12 5.0
13 Blaze 3499 50% 20 5.0
14 Matrix 3499 71% 11 5.0
15 Xtend Talk 2999 57% 7 4.9
16 Storm Pro 2499 64% 10 5.0
17 Xtend Sport 2499 64% 7 4.9
18 Mystiq 2499 64% 5 4.8
19 Wave Pro 47 1799 74% 15 4.9
20 TRebel Matrix 4999 58% 1 5.0
21 TRebel Blaze 3999 43% 6 5.0
22 TRebel Xtend‌ 3499 56% 3 5.0
23 Wave Play 1999 67% 1 4.0
24 Watch Storm - Iron Man Marvel Edition 1999 67% 2 5.0
25 Watch Storm - Black Panther Marvel Edition 1999 67% 2 5.0
26 Wave Ultima 2999 50% 4 5.0
27 Watch Storm - Captain America Marvel Edition 1999 67% 1 5.0
28 TRebel Watch Straps 399 56% 1 5.0
29 Wave Voice 2699 55% 1 5.0
30 Storm RTL 2999 50% 3 5.0
In [23]:
smart_watches.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31 entries, 0 to 30
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   ProductName   31 non-null     object 
 1   ProductPrice  31 non-null     int32  
 2   Discount      31 non-null     object 
 3   Reviews       31 non-null     int32  
 4   Rate          31 non-null     float64
dtypes: float64(1), int32(2), object(2)
memory usage: 1.1+ KB
In [ ]:
 

Data Cleaning of Trebel csv file¶

In [24]:
trebel
Out[24]:
ProductName ProductPrice Discount NumberofReviews Rate Review Summary
0 TRebel Blaze \nSale price₹ 3,999 43% off 6 reviews ★\n 5.0\n NaN TRebel Watch Blaze
1 TRebel Blaze \nSale price₹ 3,999 43% off 6 reviews ★\n 5.0\n Good features Good features for the price. Decent UI
2 TRebel Blaze \nSale price₹ 3,999 43% off 6 reviews ★\n 5.0\n Very good Very good quality product
3 TRebel Blaze \nSale price₹ 3,999 43% off 6 reviews ★\n 5.0\n Awesome watch. I strongly recommend everyone to buy this watc...
4 TRebel Blaze \nSale price₹ 3,999 43% off 6 reviews ★\n 5.0\n Working well Good product, satisfied after a week of use,ba...
... ... ... ... ... ... ... ...
138 TRebel Airdopes 441 Pro \nSale price₹ 2,999 50% off 5 reviews ★\n 4.8\n Awesome This is the right one for my love \nAnd perfec...
139 TRebel Rockerz 255 Pro \nSale price₹ 1,499 57% off 4 reviews ★\n 5.0\n Product description good Excellent package
140 TRebel Rockerz 255 Pro \nSale price₹ 1,499 57% off 4 reviews ★\n 5.0\n NaN Good product
141 TRebel Rockerz 255 Pro \nSale price₹ 1,499 57% off 4 reviews ★\n 5.0\n Boat Nice bluetooth
142 TRebel Rockerz 255 Pro \nSale price₹ 1,499 57% off 4 reviews ★\n 5.0\n Awesome fully satisfied with the product

143 rows × 7 columns

In [25]:
# Stage 1 : Dropping duplicates on basis of ProductName
trebel.drop_duplicates(subset = 'ProductName',inplace = True)
trebel['ProductName'].iloc[1]

# Stage 2 : Triming spaces in ProductName using strip
trebel['ProductName'] = trebel['ProductName'].str.strip()
trebel['ProductName'].iloc[1]

# Stage 3 : Cleaning ProductPrice column and making data type integer
product_price = trebel['ProductPrice'].str.split(' ',n=2,expand = True)
trebel['ProductPrice'] = product_price[2]
trebel['ProductPrice'] = trebel['ProductPrice'].str.replace(',','')
trebel
trebel['ProductPrice'] = trebel['ProductPrice'].astype(int)

# Stage 4 : Cleaning NumberofReviews and making data type as int
reviews = trebel['NumberofReviews'].str.split(' ',n=2,expand = True)
reviews = reviews[0]
trebel['NumberofReviews'] = reviews
trebel.rename(columns = {'NumberofReviews':'Reviews'},inplace = True)
trebel['Reviews'] = trebel['Reviews'].astype(int)

# Stage 5 : Cleaning Discount Column
discount= trebel['Discount'].str.split(' ',n=2,expand = True)
discount = discount[0]
trebel['Discount'] = discount

# Stage 6 : Cleaning Rate and making data type as float
rate = trebel['Rate'].str.split(' ', n=1 , expand = True)
rate = rate[1]
trebel['Rate'] = rate
trebel['Rate'] = trebel['Rate'].str.replace('\n','')
trebel['Rate'] = trebel['Rate'].str.strip()
trebel['Rate'] = trebel['Rate'].astype(float)

# Stage 7 : dropping Review and Summary columns
trebel.drop(columns = ['Review','Summary'],inplace = True)
trebel = trebel.set_index('ProductName').reset_index()
In [26]:
trebel
Out[26]:
ProductName ProductPrice Discount Reviews Rate
0 TRebel Blaze 3999 43% 6 5.0
1 TRebel Rockerz 333 1599 60% 6 5.0
2 TRebel Airdopes 381 2199 56% 12 4.8
3 TRebel Airdopes 402 1999 67% 9 4.8
4 TRebel BassHeads 100 399 60% 17 4.9
5 TRebel Xtend‌ 3499 56% 3 5.0
6 TRebel Rockerz 450 1499 62% 1 5.0
7 TRebel BassHeads 103 499 61% 11 5.0
8 TRebel Airdopes 141 1499 67% 63 5.0
9 TRebel BassHeads 102 399 69% 7 5.0
10 TRebel Airdopes 171 - Wireless Earbuds with 6m... 1799 70% 3 5.0
11 TRebel Matrix 4999 58% 1 5.0
12 TRebel Rockerz 330 Pro 1799 40% 3 5.0
13 Trebel Rockerz 255 Pro+ 1599 60% 7 4.7
14 TRebel Airdopes 181 1599 47% 4 4.8
15 TRebel Watch Straps 399 56% 1 5.0
16 TRebel Rockerz 235 V2 1199 60% 20 4.8
17 TRebel Airdopes 131 1299 57% 56 4.7
18 TRebel BassHeads 152 499 61% 11 5.0
19 TRebel Airdopes 441 Pro 2999 50% 5 4.8
20 TRebel Rockerz 255 Pro 1499 57% 4 5.0
In [27]:
trebel.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   ProductName   21 non-null     object 
 1   ProductPrice  21 non-null     int32  
 2   Discount      21 non-null     object 
 3   Reviews       21 non-null     int32  
 4   Rate          21 non-null     float64
dtypes: float64(1), int32(2), object(2)
memory usage: 800.0+ bytes
In [ ]:
 

Data Cleaning of Wired Headphones csv file¶

In [28]:
wired_headphones
Out[28]:
ProductName ProductPrice Discount NumberofReviews Rate Review Summary
0 Bassheads 900 \nSale price₹ 849 66% off 169 reviews ★\n 4.8\n Overall good experience. good.
1 Bassheads 900 \nSale price₹ 849 66% off 169 reviews ★\n 4.8\n Good Good product
2 Bassheads 900 \nSale price₹ 849 66% off 169 reviews ★\n 4.8\n NaN Useful product
3 Bassheads 900 \nSale price₹ 849 66% off 169 reviews ★\n 4.8\n NaN Best quality ever
4 Bassheads 900 \nSale price₹ 849 66% off 169 reviews ★\n 4.8\n NaN The build is excellent. You won't get any bett...
... ... ... ... ... ... ... ...
148 BassHeads 225 \nSale price₹ 699 30% off 7 reviews ★\n 4.9\n NaN So far good
149 BassHeads 225 \nSale price₹ 699 30% off 7 reviews ★\n 4.9\n Good earphones Good earphones
150 BassHeads 225 \nSale price₹ 699 30% off 7 reviews ★\n 4.9\n ❤️❤️❤️ Awesome product...heavy bass and better clarit...
151 BassHeads 225 \nSale price₹ 699 30% off 7 reviews ★\n 4.9\n Nice product Boat never compromised on quality... I liked t...
152 BassHeads 225 \nSale price₹ 699 30% off 7 reviews ★\n 4.9\n good good

153 rows × 7 columns

In [29]:
# Stage 1 : Dropping duplicates on basis of ProductName
wired_headphones.drop_duplicates(subset = 'ProductName',inplace = True)
wired_headphones['ProductName'].iloc[1]

# Stage 2 : Triming spaces in ProductName using strip
wired_headphones['ProductName'] = wired_headphones['ProductName'].str.strip()
wired_headphones['ProductName'].iloc[1]

# Stage 3 : Cleaning ProductPrice column and making data type integer
product_price = wired_headphones['ProductPrice'].str.split(' ',n=2,expand = True)
wired_headphones['ProductPrice'] = product_price[2]
wired_headphones['ProductPrice'] = wired_headphones['ProductPrice'].str.replace(',','')
wired_headphones
wired_headphones['ProductPrice'] = wired_headphones['ProductPrice'].astype(int)

# Stage 4 : Cleaning NumberofReviews and making data type as int
reviews = wired_headphones['NumberofReviews'].str.split(' ',n=2,expand = True)
reviews = reviews[0]
wired_headphones['NumberofReviews'] = reviews
wired_headphones.rename(columns = {'NumberofReviews':'Reviews'},inplace = True)
wired_headphones['Reviews'] = wired_headphones['Reviews'].astype(int)

# Stage 5 : Cleaning Discount Column
discount= wired_headphones['Discount'].str.split(' ',n=2,expand = True)
discount = discount[0]
wired_headphones['Discount'] = discount

# Stage 6 : Cleaning Rate and making data type as float
rate = wired_headphones['Rate'].str.split(' ', n=1 , expand = True)
rate = rate[1]
wired_headphones['Rate'] = rate
wired_headphones['Rate'] = wired_headphones['Rate'].str.replace('\n','')
wired_headphones['Rate'] = wired_headphones['Rate'].str.strip()
wired_headphones['Rate'] = wired_headphones['Rate'].astype(float)

# Stage 7 : dropping Review and Summary columns
wired_headphones.drop(columns = ['Review','Summary'],inplace = True)
wired_headphones = wired_headphones.set_index('ProductName').reset_index()
In [30]:
wired_headphones
Out[30]:
ProductName ProductPrice Discount Reviews Rate
0 Bassheads 900 849 66% 169 4.8
1 BassHeads 242 599 60% 32 4.9
2 TRebel BassHeads 242 599 60% 371 4.8
3 BassHeads 152 449 65% 73 4.8
4 BassHeads 172 499 62% 135 4.8
5 BassHeads 100 399 60% 423 4.8
6 BassHeads 102 449 65% 157 4.8
7 BassHeads 103 499 61% 552 4.8
8 BassHeads 220 699 30% 45 4.8
9 TRebel BassHeads 100 399 60% 17 4.9
10 TRebel BassHeads 152 499 61% 11 5.0
11 BassHeads 104 449 55% 66 4.7
12 TRebel BassHeads 102 399 69% 7 5.0
13 TRebel BassHeads 103 499 61% 11 5.0
14 Bassheads 192 - Make in India 449 55% 1 5.0
15 BassHeads 225 399 60% 301 4.8
16 BassHeads 228 699 41% 1 5.0
17 BassHeads 110 449 55% 1 5.0
In [31]:
wired_headphones.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18 entries, 0 to 17
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   ProductName   18 non-null     object 
 1   ProductPrice  18 non-null     int32  
 2   Discount      18 non-null     object 
 3   Reviews       18 non-null     int32  
 4   Rate          18 non-null     float64
dtypes: float64(1), int32(2), object(2)
memory usage: 704.0+ bytes
In [ ]:
 

Data Cleaning of Wireless Earbuds csv file¶

In [32]:
wireless_earbuds
Out[32]:
ProductName ProductPrice Discount NumberofReviews Rate Review Summary
0 Airdopes 131 \nSale price₹ 999 67% off 1298 reviews ★\n 4.8\n Airdopes 131 Good product, sound quality is also good
1 Airdopes 131 \nSale price₹ 999 67% off 1298 reviews ★\n 4.8\n Loved the product Product is simply awesome. Loved this.
2 Airdopes 131 \nSale price₹ 999 67% off 1298 reviews ★\n 4.8\n Airdopes not fit in years Worst product no exchange no return policy. No...
3 Airdopes 131 \nSale price₹ 999 67% off 1298 reviews ★\n 4.8\n Airdopes 131 So cool
4 Airdopes 131 \nSale price₹ 999 67% off 1298 reviews ★\n 4.8\n best aipods good quality
... ... ... ... ... ... ... ...
204 Airdopes 131 Iron Man Marvel Edition \nSale price₹ 1,599 47% off 70 reviews ★\n 4.9\n Jhakkas 🔥💯🤘😎 It's awesome thank you boAt 👍
205 Airdopes 131 Iron Man Marvel Edition \nSale price₹ 1,599 47% off 70 reviews ★\n 4.9\n Super Useful Thank u boat
206 Airdopes 131 Iron Man Marvel Edition \nSale price₹ 1,599 47% off 70 reviews ★\n 4.9\n Awesome product Worthy product according to price. And persona...
207 Airdopes 131 Iron Man Marvel Edition \nSale price₹ 1,599 47% off 70 reviews ★\n 4.9\n Very good Very good
208 Airdopes 131 Iron Man Marvel Edition \nSale price₹ 1,599 47% off 70 reviews ★\n 4.9\n This is amazing Sound is too good

209 rows × 7 columns

In [33]:
# Stage 1 : Dropping duplicates on basis of ProductName
wireless_earbuds.drop_duplicates(subset = 'ProductName',inplace = True)
wireless_earbuds['ProductName'].iloc[1]

# Stage 2 : Triming spaces in ProductName using strip
wireless_earbuds['ProductName'] = wireless_earbuds['ProductName'].str.strip()
wireless_earbuds['ProductName'].iloc[1]

# Stage 3 : Cleaning ProductPrice column and making data type integer
product_price = wireless_earbuds['ProductPrice'].str.split(' ',n=2,expand = True)
wireless_earbuds['ProductPrice'] = product_price[2]
wireless_earbuds['ProductPrice'] = wireless_earbuds['ProductPrice'].str.replace(',','')
wireless_earbuds
wireless_earbuds['ProductPrice'] = wireless_earbuds['ProductPrice'].astype(int)

# Stage 4 : Cleaning NumberofReviews and making data type as int
reviews = wireless_earbuds['NumberofReviews'].str.split(' ',n=2,expand = True)
reviews = reviews[0]
wireless_earbuds['NumberofReviews'] = reviews
wireless_earbuds.rename(columns = {'NumberofReviews':'Reviews'},inplace = True)
wireless_earbuds['Reviews'] = wireless_earbuds['Reviews'].astype(int)

# Stage 5 : Cleaning Discount Column
discount= wireless_earbuds['Discount'].str.split(' ',n=2,expand = True)
discount = discount[0]
wireless_earbuds['Discount'] = discount

# Stage 6 : Cleaning Rate and making data type as float
rate = wireless_earbuds['Rate'].str.split(' ', n=1 , expand = True)
rate = rate[1]
wireless_earbuds['Rate'] = rate
wireless_earbuds['Rate'] = wireless_earbuds['Rate'].str.replace('\n','')
wireless_earbuds['Rate'] = wireless_earbuds['Rate'].str.strip()
wireless_earbuds['Rate'] = wireless_earbuds['Rate'].astype(float)

# Stage 7 : dropping Review and Summary columns
wireless_earbuds.drop(columns = ['Review','Summary'],inplace = True)
wireless_earbuds = wireless_earbuds.set_index('ProductName').reset_index()
In [34]:
wireless_earbuds
Out[34]:
ProductName ProductPrice Discount Reviews Rate
0 Airdopes 131 999 67% 1298 4.8
1 Airdopes 131 PRO 1199 60% 157 4.9
2 Airdopes 411 ANC 2499 50% 68 4.8
3 Airdopes 121 PRO 1499 50% 40 4.9
4 Airdopes 141 1499 67% 598 4.8
5 Airdopes 121 v2 1299 57% 562 4.8
6 Airdopes 402 1499 75% 479 4.8
7 Airdopes 115 999 67% 82 4.9
8 TRebel Airdopes 141 1499 67% 63 5.0
9 TRebel Airdopes 131 1299 57% 56 4.7
10 Airdopes 161 1381 45% 119 4.9
11 Airdopes 441 1999 67% 289 4.9
12 Airdopes 141 Pro 1699 43% 37 4.9
13 Airdopes 381 1999 60% 122 4.8
14 Airdopes 413 ANC 2499 50% 14 5.0
15 Airdopes 138 1499 50% 7 5.0
16 TRebel Airdopes 381 2199 56% 12 4.8
17 TRebel Airdopes 402 1999 67% 9 4.8
18 Airdopes 171 1499 75% 69 4.8
19 Airdopes 181 1599 47% 36 4.9
20 Airdopes 451v2 2999 50% 33 4.8
21 Airdopes 131 Captain America Marvel Edition 1599 47% 43 4.9
22 Airdopes 601 ANC 2999 70% 3 5.0
23 Airdopes 441 Pro 2999 50% 154 4.8
24 TRebel Airdopes 441 Pro 2999 50% 5 4.8
25 Airdopes 500 ANC 4499 55% 3 5.0
26 Airdopes 131 Iron Man Marvel Edition 1599 47% 70 4.9
In [35]:
wireless_earbuds.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27 entries, 0 to 26
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   ProductName   27 non-null     object 
 1   ProductPrice  27 non-null     int32  
 2   Discount      27 non-null     object 
 3   Reviews       27 non-null     int32  
 4   Rate          27 non-null     float64
dtypes: float64(1), int32(2), object(2)
memory usage: 992.0+ bytes
In [ ]:
 

Data Cleaning of Wireless Headphones csv file¶

In [36]:
wireless_headphones
Out[36]:
ProductName ProductPrice Discount NumberofReviews Rate Review Summary
0 Rockerz 235 V2 \nSale price₹ 1,183 60% off 1067 reviews ★\n 4.8\n Good Sonda good
1 Rockerz 235 V2 \nSale price₹ 1,183 60% off 1067 reviews ★\n 4.8\n Best The best product ever received.
2 Rockerz 235 V2 \nSale price₹ 1,183 60% off 1067 reviews ★\n 4.8\n NaN boAt Rockerz 235 V2 | Bluetooth Stereo Wireles...
3 Rockerz 235 V2 \nSale price₹ 1,183 60% off 1067 reviews ★\n 4.8\n NaN Supar
4 Rockerz 235 V2 \nSale price₹ 1,183 60% off 1067 reviews ★\n 4.8\n NaN The product is perfect.
... ... ... ... ... ... ... ...
282 Trebel Rockerz 255 Pro+ \nSale price₹ 1,599 60% off 7 reviews ★\n 4.7\n NaN Trebel Rockerz 255 Pro+
283 Trebel Rockerz 255 Pro+ \nSale price₹ 1,599 60% off 7 reviews ★\n 4.7\n Awesome product I am using this since 10 months and fully sati...
284 Trebel Rockerz 255 Pro+ \nSale price₹ 1,599 60% off 7 reviews ★\n 4.7\n Awesome Sound quality good best Material
285 Trebel Rockerz 255 Pro+ \nSale price₹ 1,599 60% off 7 reviews ★\n 4.7\n Best service Best service and best product very staisfied
286 Trebel Rockerz 255 Pro+ \nSale price₹ 1,599 60% off 7 reviews ★\n 4.7\n Worthy deal except the ipl tickets :| Haven't heard my friends saying," tu earphones...

287 rows × 7 columns

In [37]:
# Stage 1 : Dropping duplicates on basis of ProductName
wireless_headphones.drop_duplicates(subset = 'ProductName',inplace = True)
wireless_headphones['ProductName'].iloc[1]

# Stage 2 : Triming spaces in ProductName using strip
wireless_headphones['ProductName'] = wireless_headphones['ProductName'].str.strip()
wireless_headphones['ProductName'].iloc[1]

# Stage 3 : Cleaning ProductPrice column and making data type integer
product_price = wireless_headphones['ProductPrice'].str.split(' ',n=2,expand = True)
wireless_headphones['ProductPrice'] = product_price[2]
wireless_headphones['ProductPrice'] = wireless_headphones['ProductPrice'].str.replace(',','')
wireless_headphones
wireless_headphones['ProductPrice'] = wireless_headphones['ProductPrice'].astype(int)

# Stage 4 : Cleaning NumberofReviews and making data type as int
reviews = wireless_headphones['NumberofReviews'].str.split(' ',n=2,expand = True)
reviews = reviews[0]
wireless_headphones['NumberofReviews'] = reviews
wireless_headphones.rename(columns = {'NumberofReviews':'Reviews'},inplace = True)
wireless_headphones['Reviews'] = wireless_headphones['Reviews'].astype(int)

# Stage 5 : Cleaning Discount Column
discount= wireless_headphones['Discount'].str.split(' ',n=2,expand = True)
discount = discount[0]
wireless_headphones['Discount'] = discount

# Stage 6 : Cleaning Rate and making data type as float
rate = wireless_headphones['Rate'].str.split(' ', n=1 , expand = True)
rate = rate[1]
wireless_headphones['Rate'] = rate
wireless_headphones['Rate'] = wireless_headphones['Rate'].str.replace('\n','')
wireless_headphones['Rate'] = wireless_headphones['Rate'].str.strip()
wireless_headphones['Rate'] = wireless_headphones['Rate'].astype(float)

# Stage 7 : dropping Review and Summary columns
wireless_headphones.drop(columns = ['Review','Summary'],inplace = True)
wireless_headphones = wireless_headphones.set_index('ProductName').reset_index()
In [38]:
wireless_headphones
Out[38]:
ProductName ProductPrice Discount Reviews Rate
0 Rockerz 235 V2 1183 60% 1067 4.8
1 Rockerz 255 Pro+ 1299 67% 401 4.8
2 Rockerz 245 V2 1299 57% 206 4.8
3 Rockerz 330 1299 57% 266 4.8
4 Rockerz 333 1399 65% 186 4.9
5 Rockerz 255 1299 57% 228 4.8
6 Rockerz 550 1999 60% 345 4.8
7 Rockerz 255 Pro 1399 60% 383 4.8
8 Rockerz 103 Pro 999 60% 64 4.9
9 Rockerz 450 1499 62% 267 4.8
10 Rockerz 255R 1299 57% 29 4.9
11 Rockerz 333 Pro 1799 40% 157 4.9
12 Rockerz 260 1299 48% 99 4.9
13 Rockerz 335 1699 57% 350 4.8
14 Rockerz 510 1299 67% 142 4.9
15 Rockerz 370 1199 52% 79 4.8
16 Rockerz 255 Neo 1299 57% 95 4.8
17 Rockerz 261 849 83% 67 4.8
18 Rockerz 333 ANC 1899 68% 48 5.0
19 Rockerz 385 V2 1499 70% 73 4.8
20 Rockerz 235 Pro 999 50% 48 4.9
21 Rockerz 255F 1299 57% 46 4.8
22 Rockerz 260 v1 1199 52% 40 4.9
23 Rockerz 330 Pro 1799 40% 47 4.9
24 Rockerz 400 1499 50% 95 4.8
25 Rockerz 245 Pro 1299 48% 16 4.8
26 Rockerz 205 Pro 999 60% 10 4.9
27 TRebel Rockerz 255 Pro 1499 57% 4 5.0
28 Rockerz 255 Pro Made in India 1399 60% 24 4.9
29 TRebel Rockerz 235 V2 1199 60% 20 4.8
30 Rockerz 330 ANC 1999 67% 12 4.7
31 Rockerz 600 1699 57% 35 4.8
32 Rockerz 375 999 75% 11 4.9
33 Rockerz 518 1599 60% 44 4.9
34 Nirvanaa 751 ANC 3999 50% 27 4.7
35 Rockerz 450R 1499 62% 10 5.0
36 Rockerz 425 1499 50% 22 4.9
37 Rockerz 450 Batman DC Edition 1299 67% 8 4.9
38 Rockerz 255 ARC 1299 48% 1 4.0
39 Trebel Rockerz 255 Pro+ 1599 60% 7 4.7
In [39]:
wireless_headphones.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   ProductName   40 non-null     object 
 1   ProductPrice  40 non-null     int32  
 2   Discount      40 non-null     object 
 3   Reviews       40 non-null     int32  
 4   Rate          40 non-null     float64
dtypes: float64(1), int32(2), object(2)
memory usage: 1.4+ KB
In [ ]:
 

Exploratory Data Analysis [ EDA ]

In [ ]:
 

Q1. Top 20 Product with Most Reviews and Discount.¶

In [40]:
# bluetooth_speakers
most_reviews_discount_1 = bluetooth_speakers.sort_values(['Reviews','Discount'],ascending = False).head()
most_reviews_discount_1 = most_reviews_discount_1.set_index('ProductName').reset_index()
most_reviews_discount_1
product_type = {'Product_Type':['Bluetooth Speakers','Bluetooth Speakers','Bluetooth Speakers','Bluetooth Speakers','Bluetooth Speakers',]}
product_type = pd.DataFrame(product_type)
most_reviews_discount_1['ProductType'] = product_type
most_reviews_discount_1

# limited_edition
most_reviews_discount_2 = limited_edition.sort_values(['Reviews','Discount'],ascending = False).head()
most_reviews_discount_2 = most_reviews_discount_2.set_index('ProductName').reset_index()
most_reviews_discount_2
product_type = {'Product_Type':['Limited Edition','Limited Edition','Limited Edition','Limited Edition','Limited Edition']}
product_type = pd.DataFrame(product_type)
most_reviews_discount_2['ProductType'] = product_type
most_reviews_discount_2

# misfit
most_reviews_discount_3 = misfit.sort_values(['Reviews','Discount'],ascending = False).head()
most_reviews_discount_3 = most_reviews_discount_3.set_index('ProductName').reset_index()
most_reviews_discount_3
product_type = {'Product_Type':['Misfit','Misfit','Misfit','Misfit','Misfit']}
product_type = pd.DataFrame(product_type)
most_reviews_discount_3['ProductType'] = product_type
most_reviews_discount_3

# mobile_accessories
most_reviews_discount_4 = mobile_accessories.sort_values(['Reviews','Discount'],ascending = False).head()
most_reviews_discount_4 = most_reviews_discount_4.set_index('ProductName').reset_index()
most_reviews_discount_4
product_type = {'Product_Type':['Mobile Accessories','Mobile Accessories','Mobile Accessories','Mobile Accessories','Mobile Accessories']}
product_type = pd.DataFrame(product_type)
most_reviews_discount_4['ProductType'] = product_type
most_reviews_discount_4

# smart_watches
most_reviews_discount_5 = smart_watches.sort_values(['Reviews','Discount'],ascending = False).head()
most_reviews_discount_5 = most_reviews_discount_5.set_index('ProductName').reset_index()
most_reviews_discount_5
product_type = {'Product_Type':['Smart Watches','Smart Watches','Smart Watches','Smart Watches','Smart Watches']}
product_type = pd.DataFrame(product_type)
most_reviews_discount_5['ProductType'] = product_type
most_reviews_discount_5

# trebel
most_reviews_discount_6 = trebel.sort_values(['Reviews','Discount'],ascending = False).head()
most_reviews_discount_6 = most_reviews_discount_6.set_index('ProductName').reset_index()
most_reviews_discount_6
product_type = {'Product_Type':['TRebel','TRebel','TRebel','TRebel','TRebel',]}
product_type = pd.DataFrame(product_type)
most_reviews_discount_6['ProductType'] = product_type
most_reviews_discount_6

# wired_headphones
most_reviews_discount_7 = wired_headphones.sort_values(['Reviews','Discount'],ascending = False).head()
most_reviews_discount_7 = most_reviews_discount_7.set_index('ProductName').reset_index()
most_reviews_discount_7
product_type = {'Product_Type':['Wired Headphones','Wired Headphones','Wired Headphones','Wired Headphones','Wired Headphones']}
product_type = pd.DataFrame(product_type)
most_reviews_discount_7['ProductType'] = product_type
most_reviews_discount_7

# wireless_earbuds
most_reviews_discount_8 = wireless_earbuds.sort_values(['Reviews','Discount'],ascending = False).head()
most_reviews_discount_8 = most_reviews_discount_8.set_index('ProductName').reset_index()
most_reviews_discount_8
product_type = {'Product_Type':['Wireless Earbuds','Wireless Earbuds','Wireless Earbuds','Wireless Earbuds','Wireless Earbuds']}
product_type = pd.DataFrame(product_type)
most_reviews_discount_8['ProductType'] = product_type
most_reviews_discount_8

# wireless_headphones
most_reviews_discount_9 = wireless_headphones['Discount'].dropna()
most_reviews_discount_9 = wireless_headphones.sort_values(['Reviews','Discount'],ascending = False).head()
most_reviews_discount_9 = most_reviews_discount_9.set_index('ProductName').reset_index()
most_reviews_discount_9
product_type = {'Product_Type':['Wireless Headphones','Wireless Headphones','Wireless Headphones','Wireless Headphones','Wireless Headphones']}
product_type = pd.DataFrame(product_type)
most_reviews_discount_9['ProductType'] = product_type
most_reviews_discount_9
Out[40]:
ProductName ProductPrice Discount Reviews Rate ProductType
0 Rockerz 235 V2 1183 60% 1067 4.8 Wireless Headphones
1 Rockerz 255 Pro+ 1299 67% 401 4.8 Wireless Headphones
2 Rockerz 255 Pro 1399 60% 383 4.8 Wireless Headphones
3 Rockerz 335 1699 57% 350 4.8 Wireless Headphones
4 Rockerz 550 1999 60% 345 4.8 Wireless Headphones
In [41]:
most_reviews_discount_product = pd.concat([most_reviews_discount_1,most_reviews_discount_2,most_reviews_discount_3,most_reviews_discount_4,most_reviews_discount_5,most_reviews_discount_6,most_reviews_discount_7,most_reviews_discount_8,most_reviews_discount_9])
In [42]:
most_reviews_discount_product = most_reviews_discount_product.sort_values(['Reviews','Discount'],ascending =False).head(20)
most_reviews_discount_product.set_index('ProductName').reset_index()
Out[42]:
ProductName ProductPrice Discount Reviews Rate ProductType
0 Airdopes 131 999 67% 1298 4.8 Wireless Earbuds
1 Rockerz 235 V2 1183 60% 1067 4.8 Wireless Headphones
2 Storm 1499 75% 1008 4.8 Smart Watches
3 Airdopes 141 1499 67% 598 4.8 Wireless Earbuds
4 Airdopes 121 v2 1299 57% 562 4.8 Wireless Earbuds
5 BassHeads 103 499 61% 552 4.8 Wired Headphones
6 Airdopes 402 1499 75% 479 4.8 Wireless Earbuds
7 BassHeads 100 399 60% 423 4.8 Wired Headphones
8 Rockerz 255 Pro+ 1299 67% 401 4.8 Wireless Headphones
9 Rockerz 255 Pro 1399 60% 383 4.8 Wireless Headphones
10 TRebel BassHeads 242 599 60% 371 4.8 Wired Headphones
11 Rockerz 335 1699 57% 350 4.8 Wireless Headphones
12 Rockerz 550 1999 60% 345 4.8 Wireless Headphones
13 BassHeads 225 399 60% 301 4.8 Wired Headphones
14 Airdopes 441 1999 67% 289 4.9 Wireless Earbuds
15 Bassheads 900 849 66% 169 4.8 Wired Headphones
16 Mask 499 17% 147 4.9 Mobile Accessories
17 Xtend‌ 2699 66% 146 4.8 Smart Watches
18 Stone 190 1299 57% 108 4.8 Bluetooth Speakers
19 Stone 650 1999 60% 107 4.9 Bluetooth Speakers

Airdopes 402 Has The Most Reviews And Has The Highest Percentage Of Discount And Is Best In Wireless Earbuds Segment.

Airdopes 402

In [43]:
most_reviews_discount_product = most_reviews_discount_product.groupby('ProductType').count().iloc[:,[0]]
In [44]:
most_reviews_discount_product.rename(columns = {'ProductName':'Total Products'},inplace = True)
most_reviews_discount_product =  most_reviews_discount_product.reset_index()
most_reviews_discount_product
Out[44]:
ProductType Total Products
0 Bluetooth Speakers 2
1 Mobile Accessories 1
2 Smart Watches 2
3 Wired Headphones 5
4 Wireless Earbuds 5
5 Wireless Headphones 5
In [45]:
# Figure Size
plt.figure(figsize=(10,5))
print(' ')

# For Graph Title
plt.title('Different Products Segment that has highest percentage of Discount and Reviews',
         fontsize = 15,
         fontweight = 'bold',
         fontname = 'Times New Roman')

print('  ')

# For plotting Pie Chart
plt.pie(x = most_reviews_discount_product['Total Products'],
        autopct = '%1.1f%%',
        explode = [0.1,0.1,0.1,0.1,0.1,0.1],
       labels = most_reviews_discount_product['ProductType'],
       colors = ['lightskyblue']);
 
  

Wired Headphones , wireless Earbuds and Wireless Headphones have more Reviews and more Discount.

In [ ]:
 

Q2. Top 10 Products with Highest Rating and Highest Reviews.¶

In [46]:
# bluetooth_speakers
top_5_rated_products_1 = bluetooth_speakers.sort_values(['Rate','Reviews'],ascending = False).head()
top_5_rated_products_1 = top_5_rated_products_1.set_index('ProductName').reset_index()
top_5_rated_products_1
product_type = {'Product_Type':['Bluetooth Speakers','Bluetooth Speakers','Bluetooth Speakers','Bluetooth Speakers','Bluetooth Speakers',]}
product_type = pd.DataFrame(product_type)
top_5_rated_products_1['ProductType'] = product_type
top_5_rated_products_1

# limited_edition
top_5_rated_products_2 = limited_edition.sort_values(['Rate','Reviews'],ascending = False).head()
top_5_rated_products_2 = top_5_rated_products_2.set_index('ProductName').reset_index()
top_5_rated_products_2
product_type = {'Product_Type':['Limited Edition','Limited Edition','Limited Edition','Limited Edition','Limited Edition']}
product_type = pd.DataFrame(product_type)
top_5_rated_products_2['ProductType'] = product_type
top_5_rated_products_2

# misfit
top_5_rated_products_3 = misfit.sort_values(['Rate','Reviews'],ascending = False).head()
top_5_rated_products_3 = top_5_rated_products_3.set_index('ProductName').reset_index()
top_5_rated_products_3
product_type = {'Product_Type':['Misfit','Misfit','Misfit','Misfit','Misfit']}
product_type = pd.DataFrame(product_type)
top_5_rated_products_3['ProductType'] = product_type
top_5_rated_products_3

# mobile_accessories
top_5_rated_products_4 = mobile_accessories.sort_values(['Rate','Reviews'],ascending = False).head()
top_5_rated_products_4 = top_5_rated_products_4.set_index('ProductName').reset_index()
top_5_rated_products_4
product_type = {'Product_Type':['Mobile Accessories','Mobile Accessories','Mobile Accessories','Mobile Accessories','Mobile Accessories']}
product_type = pd.DataFrame(product_type)
top_5_rated_products_4['ProductType'] = product_type
top_5_rated_products_4

# smart_watches
top_5_rated_products_5 = smart_watches.sort_values(['Rate','Reviews'],ascending = False).head()
top_5_rated_products_5 = top_5_rated_products_5.set_index('ProductName').reset_index()
top_5_rated_products_5
product_type = {'Product_Type':['Smart Watches','Smart Watches','Smart Watches','Smart Watches','Smart Watches']}
product_type = pd.DataFrame(product_type)
top_5_rated_products_5['ProductType'] = product_type
top_5_rated_products_5

# trebel
top_5_rated_products_6 = trebel.sort_values(['Reviews','Discount'],ascending = False).head()
top_5_rated_products_6 = top_5_rated_products_6.set_index('ProductName').reset_index()
top_5_rated_products_6
product_type = {'Product_Type':['TRebel','TRebel','TRebel','TRebel','TRebel',]}
product_type = pd.DataFrame(product_type)
top_5_rated_products_6['ProductType'] = product_type
top_5_rated_products_6

# wired_headphones
top_5_rated_products_7 = wired_headphones.sort_values(['Rate','Reviews'],ascending = False).head()
top_5_rated_products_7 = top_5_rated_products_7.set_index('ProductName').reset_index()
top_5_rated_products_7
product_type = {'Product_Type':['Wired Headphones','Wired Headphones','Wired Headphones','Wired Headphones','Wired Headphones']}
product_type = pd.DataFrame(product_type)
top_5_rated_products_7['ProductType'] = product_type
top_5_rated_products_7

# wireless_earbuds
top_5_rated_products_8 = wireless_earbuds.sort_values(['Rate','Reviews'],ascending = False).head()
top_5_rated_products_8 = top_5_rated_products_8.set_index('ProductName').reset_index()
top_5_rated_products_8
product_type = {'Product_Type':['Wireless Earbuds','Wireless Earbuds','Wireless Earbuds','Wireless Earbuds','Wireless Earbuds']}
product_type = pd.DataFrame(product_type)
top_5_rated_products_8['ProductType'] = product_type
top_5_rated_products_8

# wireless_headphones
top_5_rated_products_9 = wireless_headphones['Discount'].dropna()
top_5_rated_products_9 = wireless_headphones.sort_values(['Rate','Reviews'],ascending = False).head()
top_5_rated_products_9 = top_5_rated_products_9.set_index('ProductName').reset_index()
top_5_rated_products_9
product_type = {'Product_Type':['Wireless Headphones','Wireless Headphones','Wireless Headphones','Wireless Headphones','Wireless Headphones']}
product_type = pd.DataFrame(product_type)
top_5_rated_products_9['ProductType'] = product_type
In [47]:
top_5_rated_products = pd.concat([top_5_rated_products_1,top_5_rated_products_2,top_5_rated_products_3,top_5_rated_products_4,top_5_rated_products_5,top_5_rated_products_6,top_5_rated_products_7,top_5_rated_products_8,top_5_rated_products_9])
In [48]:
top_5_rated_products = top_5_rated_products.sort_values(['Rate','Reviews'],ascending = False)
top_5_rated_products = top_5_rated_products.where(top_5_rated_products['Reviews'] > 10).dropna()
top_5_rated_products = top_5_rated_products.set_index('ProductName').reset_index().head(10)
top_5_rated_products = top_5_rated_products.iloc[:,[0,3,4,5]]
top_5_rated_products
Out[48]:
ProductName Reviews Rate ProductType
0 TRebel Airdopes 141 63.0 5.0 TRebel
1 TRebel Airdopes 141 63.0 5.0 Wireless Earbuds
2 Rockerz 333 ANC 48.0 5.0 Wireless Headphones
3 Stone 1200F 38.0 5.0 Bluetooth Speakers
4 Airdopes 441 Pro Special Batman Edition 24.0 5.0 Limited Edition
5 Blaze 20.0 5.0 Smart Watches
6 Airdopes 413 ANC 14.0 5.0 Wireless Earbuds
7 Wave Select 13.0 5.0 Smart Watches
8 Wave Pro 12.0 5.0 Smart Watches
9 Matrix 11.0 5.0 Smart Watches
In [49]:
# Figure Size
plt.figure(figsize=(20,6))

# Plotting Barplot
ax = sns.barplot(data = top_5_rated_products,
            x = 'ProductName',
            y = 'Reviews',
           palette = 'cool')

# For Labels
ax.bar_label(ax.containers[0])

# X-axis Customization
plt.xticks(rotation = 45,fontsize = 13)
plt.xlabel('', fontsize = 20)

# y-axis Customization
plt.yticks(fontsize = 13)
plt.ylabel('', fontsize = 20)

# Increasing Data Labels Size
sns.set(font_scale = 1.2)

# Title
plt.title("Highest Number of Reviews by Product Names",fontsize= 25);

TRebel Airdopes 141 which falls in T-Rebel category has the Highest Rating and Most Number of Reviews.

TRebel Airdopes 141

In [ ]:
 

Q3. Product with Most Number Of Reviews.¶

In [50]:
# bluetooth_speakers
highest_reviews_1 = bluetooth_speakers.sort_values(['Reviews'],ascending = False).head()
highest_reviews_1 = highest_reviews_1.set_index('ProductName').reset_index()
highest_reviews_1 
product_type = {'Product_Type':['Bluetooth Speakers','Bluetooth Speakers','Bluetooth Speakers','Bluetooth Speakers','Bluetooth Speakers',]}
product_type = pd.DataFrame(product_type)
highest_reviews_1 ['ProductType'] = product_type
highest_reviews_1 

# limited_edition
highest_reviews_2= limited_edition.sort_values(['Reviews'],ascending = False).head()
highest_reviews_2= highest_reviews_2.set_index('ProductName').reset_index()
highest_reviews_2
product_type = {'Product_Type':['Limited Edition','Limited Edition','Limited Edition','Limited Edition','Limited Edition']}
product_type = pd.DataFrame(product_type)
highest_reviews_2['ProductType'] = product_type
highest_reviews_2


# misfit
highest_reviews_3 = misfit.sort_values(['Reviews'],ascending = False).head()
highest_reviews_3 = highest_reviews_3.set_index('ProductName').reset_index()
highest_reviews_3
product_type = {'Product_Type':['Misfit','Misfit','Misfit','Misfit','Misfit']}
product_type = pd.DataFrame(product_type)
highest_reviews_3['ProductType'] = product_type
highest_reviews_3


# mobile_accessories
highest_reviews_4 = mobile_accessories.sort_values(['Reviews'],ascending = False).head()
highest_reviews_4 = highest_reviews_4.set_index('ProductName').reset_index()
highest_reviews_4
product_type = {'Product_Type':['Mobile Accessories','Mobile Accessories','Mobile Accessories','Mobile Accessories','Mobile Accessories']}
product_type = pd.DataFrame(product_type)
highest_reviews_4['ProductType'] = product_type
highest_reviews_4

# smart_watches
highest_reviews_5 = smart_watches.sort_values(['Reviews'],ascending = False).head()
highest_reviews_5 = highest_reviews_5.set_index('ProductName').reset_index()
highest_reviews_5
product_type = {'Product_Type':['Smart Watches','Smart Watches','Smart Watches','Smart Watches','Smart Watches']}
product_type = pd.DataFrame(product_type)
highest_reviews_5['ProductType'] = product_type
highest_reviews_5


# trebel
highest_reviews_6 = trebel.sort_values(['Reviews'],ascending = False).head()
highest_reviews_6 = highest_reviews_6.set_index('ProductName').reset_index()
highest_reviews_6
product_type = {'Product_Type':['TRebel','TRebel','TRebel','TRebel','TRebel',]}
product_type = pd.DataFrame(product_type)
highest_reviews_6['ProductType'] = product_type
highest_reviews_6


# wired_headphones
highest_reviews_7 = wired_headphones.sort_values(['Reviews'],ascending = False).head()
highest_reviews_7 = highest_reviews_7.set_index('ProductName').reset_index()
highest_reviews_7
product_type = {'Product_Type':['Wired Headphones','Wired Headphones','Wired Headphones','Wired Headphones','Wired Headphones']}
product_type = pd.DataFrame(product_type)
highest_reviews_7['ProductType'] = product_type
highest_reviews_7

# wireless_earbuds
highest_reviews_8 = wireless_earbuds.sort_values(['Reviews'],ascending = False).head()
highest_reviews_8 = highest_reviews_8.set_index('ProductName').reset_index()
highest_reviews_8
product_type = {'Product_Type':['Wireless Earbuds','Wireless Earbuds','Wireless Earbuds','Wireless Earbuds','Wireless Earbuds']}
product_type = pd.DataFrame(product_type)
highest_reviews_8['ProductType'] = product_type
highest_reviews_8

# wireless_headphones
highest_reviews_9 = wireless_headphones['Discount'].dropna()
highest_reviews_9 = wireless_headphones.sort_values(['Reviews'],ascending = False).head()
highest_reviews_9 = highest_reviews_9.set_index('ProductName').reset_index()
highest_reviews_9
product_type = {'Product_Type':['Wireless Headphones','Wireless Headphones','Wireless Headphones','Wireless Headphones','Wireless Headphones']}
product_type = pd.DataFrame(product_type)
highest_reviews_9['ProductType'] = product_type
In [51]:
highest_reviews = pd.concat([highest_reviews_1,highest_reviews_2,highest_reviews_3,highest_reviews_4,highest_reviews_5,highest_reviews_6,highest_reviews_7,highest_reviews_8,highest_reviews_9])
In [52]:
highest_reviews = highest_reviews.sort_values('Reviews',ascending = False).set_index('ProductName').reset_index()
highest_reviews.head(10)
Out[52]:
ProductName ProductPrice Discount Reviews Rate ProductType
0 Airdopes 131 999 67% 1298 4.8 Wireless Earbuds
1 Rockerz 235 V2 1183 60% 1067 4.8 Wireless Headphones
2 Storm 1499 75% 1008 4.8 Smart Watches
3 Airdopes 141 1499 67% 598 4.8 Wireless Earbuds
4 Airdopes 121 v2 1299 57% 562 4.8 Wireless Earbuds
5 BassHeads 103 499 61% 552 4.8 Wired Headphones
6 Airdopes 402 1499 75% 479 4.8 Wireless Earbuds
7 BassHeads 100 399 60% 423 4.8 Wired Headphones
8 Rockerz 255 Pro+ 1299 67% 401 4.8 Wireless Headphones
9 Rockerz 255 Pro 1399 60% 383 4.8 Wireless Headphones

Airdopes 131 which falls under Wireless Earbuds Category has the Most Number of Reviews.

Airdopes 131 photo

In [ ]:
 

Q4. Less Rated products.¶

In [53]:
# bluetooth_speakers
bottom_5_rated_products_1 = bluetooth_speakers.sort_values(['Rate'],ascending = True).head()
bottom_5_rated_products_1 = bottom_5_rated_products_1.set_index('ProductName').reset_index()
bottom_5_rated_products_1
product_type = {'Product_Type':['Bluetooth Speakers','Bluetooth Speakers','Bluetooth Speakers','Bluetooth Speakers','Bluetooth Speakers',]}
product_type = pd.DataFrame(product_type)
bottom_5_rated_products_1['ProductType'] = product_type
bottom_5_rated_products_1

# limited_edition
bottom_5_rated_products_2 = limited_edition.sort_values(['Rate'],ascending = True).head()
bottom_5_rated_products_2 = bottom_5_rated_products_2.set_index('ProductName').reset_index()
bottom_5_rated_products_2
product_type = {'Product_Type':['Limited Edition','Limited Edition','Limited Edition','Limited Edition','Limited Edition']}
product_type = pd.DataFrame(product_type)
bottom_5_rated_products_2['ProductType'] = product_type
bottom_5_rated_products_2

# misfit
bottom_5_rated_products_3 = misfit.sort_values(['Rate'],ascending = True).head()
bottom_5_rated_products_3 = bottom_5_rated_products_3.set_index('ProductName').reset_index()
bottom_5_rated_products_3
product_type = {'Product_Type':['Misfit','Misfit','Misfit','Misfit','Misfit']}
product_type = pd.DataFrame(product_type)
bottom_5_rated_products_3['ProductType'] = product_type
bottom_5_rated_products_3

# mobile_accessories
bottom_5_rated_products_4 = mobile_accessories.sort_values(['Rate'],ascending = True).head()
bottom_5_rated_products_4 = bottom_5_rated_products_4.set_index('ProductName').reset_index()
bottom_5_rated_products_4
product_type = {'Product_Type':['Mobile Accessories','Mobile Accessories','Mobile Accessories','Mobile Accessories','Mobile Accessories']}
product_type = pd.DataFrame(product_type)
bottom_5_rated_products_4['ProductType'] = product_type
bottom_5_rated_products_4

# smart_watches
bottom_5_rated_products_5 = smart_watches.sort_values(['Rate'],ascending = True).head()
bottom_5_rated_products_5 = bottom_5_rated_products_5.set_index('ProductName').reset_index()
bottom_5_rated_products_5
product_type = {'Product_Type':['Smart Watches','Smart Watches','Smart Watches','Smart Watches','Smart Watches']}
product_type = pd.DataFrame(product_type)
bottom_5_rated_products_5['ProductType'] = product_type
bottom_5_rated_products_5

# trebel
bottom_5_rated_products_6 = trebel.sort_values(['Rate'],ascending = True).head()
bottom_5_rated_products_6 = bottom_5_rated_products_6.set_index('ProductName').reset_index()
bottom_5_rated_products_6
product_type = {'Product_Type':['TRebel','TRebel','TRebel','TRebel','TRebel',]}
product_type = pd.DataFrame(product_type)
bottom_5_rated_products_6['ProductType'] = product_type
bottom_5_rated_products_6

# wired_headphones
bottom_5_rated_products_7 = wired_headphones.sort_values(['Rate'],ascending = True).head()
bottom_5_rated_products_7 = bottom_5_rated_products_7.set_index('ProductName').reset_index()
bottom_5_rated_products_7
product_type = {'Product_Type':['Wired Headphones','Wired Headphones','Wired Headphones','Wired Headphones','Wired Headphones']}
product_type = pd.DataFrame(product_type)
bottom_5_rated_products_7['ProductType'] = product_type
bottom_5_rated_products_7

# wireless_earbuds
bottom_5_rated_products_8 = wireless_earbuds.sort_values(['Rate'],ascending = True).head()
bottom_5_rated_products_8 = bottom_5_rated_products_8.set_index('ProductName').reset_index()
bottom_5_rated_products_8
product_type = {'Product_Type':['Wireless Earbuds','Wireless Earbuds','Wireless Earbuds','Wireless Earbuds','Wireless Earbuds']}
product_type = pd.DataFrame(product_type)
bottom_5_rated_products_8['ProductType'] = product_type
bottom_5_rated_products_8

# wireless_headphones
bottom_5_rated_products_9= wireless_headphones.sort_values(['Rate'],ascending = True).head()
bottom_5_rated_products_9= bottom_5_rated_products_9.set_index('ProductName').reset_index()
bottom_5_rated_products_9
product_type = {'Product_Type':['Wireless Headphones','Wireless Headphones','Wireless Headphones','Wireless Headphones','Wireless Headphones']}
product_type = pd.DataFrame(product_type)
bottom_5_rated_products_9['ProductType'] = product_type
In [54]:
bottom_5_rated_products = pd.concat([bottom_5_rated_products_1,bottom_5_rated_products_2,bottom_5_rated_products_3,bottom_5_rated_products_4,bottom_5_rated_products_5,bottom_5_rated_products_6,bottom_5_rated_products_7,bottom_5_rated_products_8,bottom_5_rated_products_9])
In [55]:
bottom_5_rated = bottom_5_rated_products.where(bottom_5_rated_products['Reviews'] > 10).dropna().set_index('ProductName').reset_index().head(10)
bottom_5_rated
Out[55]:
ProductName ProductPrice Discount Reviews Rate ProductType
0 Rugby Plus 1999.0 60% 11.0 4.6 Bluetooth Speakers
1 Stone 190 1299.0 57% 108.0 4.8 Bluetooth Speakers
2 Stone 352 1699.0 51% 26.0 4.8 Bluetooth Speakers
3 Stone 180 939.0 62% 20.0 4.8 Bluetooth Speakers
4 Stone Grenade 1499.0 62% 92.0 4.9 Bluetooth Speakers
5 Bassheads 172 Sunburn Edition 349.0 73% 12.0 4.8 Limited Edition
6 BassHeads 152 KKR Edition 549.0 57% 21.0 4.8 Limited Edition
7 Misfit T50 Trimmer 989.0 60% 20.0 4.9 Misfit
8 Misfit T200 1699.0 58% 42.0 4.9 Misfit
9 LTG 500 Indestructible Apple Certified Lightni... 899.0 49% 26.0 4.8 Mobile Accessories

Boat Ruby Plus which falls under Bluetooth Speakers category Is Less Rated Product Among All The Other Products

Boat Ruby plus

In [ ]:
 

Q5. Estimated Total Sales Generated Only Through Reviewed Products From different Categories.¶

In [56]:
total_sales = bluetooth_speakers['Reviews'].sum() + limited_edition['Reviews'].sum() + misfit['Reviews'].sum() + mobile_accessories['Reviews'].sum() + smart_watches['Reviews'].sum() + trebel['Reviews'].sum() + wired_headphones['Reviews'].sum() + wireless_earbuds['Reviews'].sum() + wireless_headphones['Reviews'].sum() 
In [57]:
total_products = bluetooth_speakers['ProductName'].count() + limited_edition['ProductName'].count() + misfit['ProductName'].count() + mobile_accessories['ProductName'].count()+ smart_watches['ProductName'].count() + trebel['ProductName'].count()+ wired_headphones['ProductName'].count() + wireless_earbuds['ProductName'].count()+ wireless_headphones['ProductName'].count() 
In [58]:
print("\n Total Sales Generated only from reviews are {} from Total Products {}.\n".format(total_sales,total_products))
 Total Sales Generated only from reviews are 14975 from Total Products 196.

Q6. Estimated Total Amount Generated only through reviewed products from different Categories.¶

In [59]:
# bluetooth_speakers
bluetooth_speakers_total_amount = bluetooth_speakers
bluetooth_speakers_total_amount['TotalAmount'] = bluetooth_speakers_total_amount['ProductPrice'] * bluetooth_speakers_total_amount['Reviews'] 

# limited_edition
limited_edition_total_amount = limited_edition
limited_edition_total_amount['TotalAmount'] = limited_edition_total_amount['ProductPrice'] * limited_edition_total_amount['Reviews'] 

# misfit
misfit_total_amount = misfit
misfit_total_amount['TotalAmount'] = misfit_total_amount['ProductPrice'] * misfit_total_amount['Reviews'] 

# mobile_accessories
mobile_accessories_total_amount = mobile_accessories
mobile_accessories_total_amount['TotalAmount'] = mobile_accessories_total_amount['ProductPrice'] * mobile_accessories_total_amount['Reviews'] 

# smart_watches
smart_watches_total_amount = smart_watches
smart_watches_total_amount['TotalAmount'] = smart_watches_total_amount['ProductPrice'] * smart_watches_total_amount['Reviews'] 

# trebel
trebel_total_amount = trebel
trebel_total_amount['TotalAmount'] = trebel_total_amount['ProductPrice'] * trebel_total_amount['Reviews'] 

# wired_headphones
wired_headphones_total_amount = wired_headphones
wired_headphones_total_amount['TotalAmount'] = wired_headphones_total_amount['ProductPrice'] * wired_headphones_total_amount['Reviews'] 

# wireless_earbuds
wireless_earbuds_total_amount = wireless_earbuds
wireless_earbuds_total_amount['TotalAmount'] = wireless_earbuds_total_amount['ProductPrice'] * wireless_earbuds_total_amount['Reviews'] 

# wireless_headphones
wireless_headphones_total_amount = wireless_headphones
wireless_headphones_total_amount['TotalAmount'] = wireless_headphones_total_amount['ProductPrice'] * wireless_headphones_total_amount['Reviews'] 
In [60]:
total_amount = bluetooth_speakers_total_amount['TotalAmount'].sum() + limited_edition_total_amount['TotalAmount'].sum() + misfit_total_amount['TotalAmount'].sum() +  mobile_accessories_total_amount['TotalAmount'].sum()  + smart_watches_total_amount['TotalAmount'].sum() + trebel_total_amount['TotalAmount'].sum() +  wired_headphones_total_amount['TotalAmount'].sum() + wireless_headphones_total_amount['TotalAmount'].sum() + wireless_headphones_total_amount['TotalAmount'].sum()
In [61]:
print('Estimated Total Amount {} generated from Estimated Total Sales {} from only {} products. '.format(total_amount,total_sales,total_products))
Estimated Total Amount 20354530 generated from Estimated Total Sales 14975 from only 196 products. 

2 Crore Generated from 14 Thousand Customers only from 196 Products.¶

In [ ]:
 

Q7. Overall Revenue Generated from all Categories of Products.¶

In [62]:
#  Total Revenue

bluetooth_speakers_total_amount['TotalAmount'].sum() # 788266
limited_edition_total_amount['TotalAmount'].sum() # 407770
misfit_total_amount['TotalAmount'].sum() # 102326
mobile_accessories_total_amount['TotalAmount'].sum() # 208998
smart_watches_total_amount['TotalAmount'].sum() # 2955045
trebel_total_amount['TotalAmount'].sum() # 356450
wired_headphones_total_amount['TotalAmount'].sum() # 1203077
wireless_earbuds_total_amount['TotalAmount'].sum() # 6387030
wireless_headphones_total_amount['TotalAmount'].sum() # 7166299
Out[62]:
7166299
In [63]:
#  Total Sales

print(bluetooth_speakers['Reviews'].sum())   # 434
print(limited_edition['Reviews'].sum()) # 280
print(misfit['Reviews'].sum()) # 74
print(mobile_accessories['Reviews'].sum()) # 452
print(smart_watches['Reviews'].sum()) # 1605
print(trebel['Reviews'].sum()) # 250
print(wired_headphones['Reviews'].sum()) # 2373
print(wireless_earbuds['Reviews'].sum())  # 4428
print(wireless_headphones['Reviews'].sum()) # 5079
434
280
74
452
1605
250
2373
4428
5079
In [64]:
product_category = {'Product_Category': ['Bluetooth Speakers','Limited Edition','Misfit','Mobile Accessories','Smart Watches','T Rebel','Wired Headphones','Wireless Earbuds','Wireless Headphones']}
product_category
product_category = pd.DataFrame(product_category)

total_revenue= {'Total_Revenue':[788266,407770,102326,208998,2955045,356450,1203077,6387030,7166299]}
total_revenue
total_revenue = pd.DataFrame(total_revenue)

total_sales= {'Total_Sales':[434,280,74,452,1605,250,2373,4428,5079]}
total_sales = pd.DataFrame(total_sales)
In [65]:
sales_revenue_by_category = pd.concat([product_category,total_revenue,total_sales],axis = 1)
sales_revenue_by_category
Out[65]:
Product_Category Total_Revenue Total_Sales
0 Bluetooth Speakers 788266 434
1 Limited Edition 407770 280
2 Misfit 102326 74
3 Mobile Accessories 208998 452
4 Smart Watches 2955045 1605
5 T Rebel 356450 250
6 Wired Headphones 1203077 2373
7 Wireless Earbuds 6387030 4428
8 Wireless Headphones 7166299 5079
In [66]:
pe.bar(data_frame = sales_revenue_by_category,
      x = 'Product_Category',
      y = 'Total_Revenue',
      text = 'Total_Revenue',
      color = 'Product_Category',
      title = 'Estimated Total Revenue Generated from Different Categories of Products')
In [ ]:
 

Q8. Overall Sales Generated from all Categories.¶

In [67]:
sales_revenue_by_category
Out[67]:
Product_Category Total_Revenue Total_Sales
0 Bluetooth Speakers 788266 434
1 Limited Edition 407770 280
2 Misfit 102326 74
3 Mobile Accessories 208998 452
4 Smart Watches 2955045 1605
5 T Rebel 356450 250
6 Wired Headphones 1203077 2373
7 Wireless Earbuds 6387030 4428
8 Wireless Headphones 7166299 5079
In [68]:
pe.bar(data_frame = sales_revenue_by_category,
      x = 'Product_Category',
      y = 'Total_Sales',
      color = 'Product_Category',
      text = 'Total_Sales',
      title= 'Estimated Total Sales Generated from Different Categories of Products')
In [69]:
fig,axes = plt.subplots(2,1,figsize=(20,15))
fig.suptitle('Estimated Total Revenue and Total Sales Generated From Different Categories',fontsize = 20)

# Total Revenue Generated 
sns.barplot(ax=axes[0],
           x = 'Product_Category',
           y='Total_Revenue',
           data = sales_revenue_by_category.sort_values('Total_Revenue',ascending = False),
           palette = 'turbo')
axes[0].set_title('Total Revenue generated from different Categories (in Lakhs)')


#Total Sales Generated
sns.barplot(ax=axes[1],
           x = 'Product_Category',
           y='Total_Sales',
           data = sales_revenue_by_category.sort_values('Total_Revenue',ascending = False),
           palette = 'turbo')
axes[1].set_title('Estimated Total Sales Generated (only from reviews) from different Categories');
In [70]:
sales_revenue_by_category.sort_values('Total_Revenue',ascending = False).set_index('Product_Category').reset_index()
Out[70]:
Product_Category Total_Revenue Total_Sales
0 Wireless Headphones 7166299 5079
1 Wireless Earbuds 6387030 4428
2 Smart Watches 2955045 1605
3 Wired Headphones 1203077 2373
4 Bluetooth Speakers 788266 434
5 Limited Edition 407770 280
6 T Rebel 356450 250
7 Mobile Accessories 208998 452
8 Misfit 102326 74
In [ ]:
 

Q9. Highest and Lowest Discounted Products.¶

In [71]:
discount = pd.concat([bluetooth_speakers,limited_edition,misfit,mobile_accessories,smart_watches,trebel,wired_headphones,wireless_earbuds,wireless_headphones],axis = 0)
discount = discount.sort_values('Discount',ascending = True)
discount
Out[71]:
ProductName ProductPrice Discount Reviews Rate TotalAmount
0 Mask 499 17% 147 4.9 73353
8 BassHeads 220 699 30% 45 4.8 31455
11 Xplorer‌‌ O2 3999 33% 29 4.8 115971
9 Stone 500 1999 33% 8 4.9 15992
23 Rockerz 330 Pro 1799 40% 47 4.9 84553
... ... ... ... ... ... ...
1 Storm Call 1799 77% 23 4.7 41377
6 Wave Call 1799 77% 41 4.9 73759
12 Wave Pro 1499 79% 12 5.0 17988
8 Watch Flash 1399 80% 48 4.8 67152
17 Rockerz 261 849 83% 67 4.8 56883

196 rows × 6 columns

In [72]:
discount['Discount'].unique()
Out[72]:
array(['17%', '30%', '33%', '40%', '41%', '43%', '45%', '47%', '48%',
       '49%', '50%', '51%', '52%', '53%', '54%', '55%', '56%', '57%',
       '58%', '59%', '60%', '61%', '62%', '63%', '64%', '65%', '66%',
       '67%', '68%', '69%', '70%', '71%', '72%', '73%', '74%', '75%',
       '76%', '77%', '79%', '80%', '83%'], dtype=object)
In [73]:
discount
Out[73]:
ProductName ProductPrice Discount Reviews Rate TotalAmount
0 Mask 499 17% 147 4.9 73353
8 BassHeads 220 699 30% 45 4.8 31455
11 Xplorer‌‌ O2 3999 33% 29 4.8 115971
9 Stone 500 1999 33% 8 4.9 15992
23 Rockerz 330 Pro 1799 40% 47 4.9 84553
... ... ... ... ... ... ...
1 Storm Call 1799 77% 23 4.7 41377
6 Wave Call 1799 77% 41 4.9 73759
12 Wave Pro 1499 79% 12 5.0 17988
8 Watch Flash 1399 80% 48 4.8 67152
17 Rockerz 261 849 83% 67 4.8 56883

196 rows × 6 columns

In [74]:
plt.figure(figsize=(30,6))
sns.histplot(data = discount,
            x = 'Discount')
plt.xlabel('Discount',fontsize = 20)
plt.title('Range of Discounts of Products',fontsize = 20);
In [75]:
discount = discount.groupby('Discount').count().iloc[:,[0]]
discount.rename(columns = {'ProductName':'Product_Count'},inplace=True)
discount.sort_values(['Product_Count'],ascending = False)
Out[75]:
Product_Count
Discount
60% 25
67% 23
57% 22
50% 20
70% 8
55% 7
56% 7
47% 7
62% 7
75% 6
65% 5
61% 5
43% 5
58% 4
69% 4
40% 4
64% 3
48% 3
53% 3
77% 2
76% 2
52% 2
66% 2
33% 2
49% 2
74% 1
80% 1
73% 1
79% 1
72% 1
71% 1
17% 1
68% 1
63% 1
30% 1
59% 1
54% 1
51% 1
45% 1
41% 1
83% 1
In [76]:
print("\n Highest Discount Percentage - {}\n".format(discount.index.max()))
 Highest Discount Percentage - 83%

In [77]:
print("\n Least Discount Percentage - {}\n".format(discount.index.min()))
 Least Discount Percentage - 17%

In [78]:
print("\n Total Number of Discount Percentage Offers - {}\n".format(discount['Product_Count'].count()))
 Total Number of Discount Percentage Offers - 41

In [ ]:
 

Q10. Prices Range Scatterplot.¶

In [79]:
trebel
Out[79]:
ProductName ProductPrice Discount Reviews Rate TotalAmount
0 TRebel Blaze 3999 43% 6 5.0 23994
1 TRebel Rockerz 333 1599 60% 6 5.0 9594
2 TRebel Airdopes 381 2199 56% 12 4.8 26388
3 TRebel Airdopes 402 1999 67% 9 4.8 17991
4 TRebel BassHeads 100 399 60% 17 4.9 6783
5 TRebel Xtend‌ 3499 56% 3 5.0 10497
6 TRebel Rockerz 450 1499 62% 1 5.0 1499
7 TRebel BassHeads 103 499 61% 11 5.0 5489
8 TRebel Airdopes 141 1499 67% 63 5.0 94437
9 TRebel BassHeads 102 399 69% 7 5.0 2793
10 TRebel Airdopes 171 - Wireless Earbuds with 6m... 1799 70% 3 5.0 5397
11 TRebel Matrix 4999 58% 1 5.0 4999
12 TRebel Rockerz 330 Pro 1799 40% 3 5.0 5397
13 Trebel Rockerz 255 Pro+ 1599 60% 7 4.7 11193
14 TRebel Airdopes 181 1599 47% 4 4.8 6396
15 TRebel Watch Straps 399 56% 1 5.0 399
16 TRebel Rockerz 235 V2 1199 60% 20 4.8 23980
17 TRebel Airdopes 131 1299 57% 56 4.7 72744
18 TRebel BassHeads 152 499 61% 11 5.0 5489
19 TRebel Airdopes 441 Pro 2999 50% 5 4.8 14995
20 TRebel Rockerz 255 Pro 1499 57% 4 5.0 5996
In [80]:
trebel['ProductName'].iloc[10]
Out[80]:
'TRebel Airdopes 171 - Wireless Earbuds with 6mm Drivers, 380mAh Charging cum Carrying Case, IPX4 Sweat and Water Resistance, 13H nonstop Music'
In [81]:
# Changing 'TRebel Airdopes 171 - Wireless Earbuds with 6mm Drivers, 380mAh Charging cum Carrying Case, IPX4 Sweat and Water Resistance, 13H nonstop Music' to 'TRebel Airdopes 171'
trebel['ProductName'].loc[10] = 'TRebel Airdopes 171'
trebel
C:\Users\Sreejith\AppData\Local\Temp\ipykernel_17028\3497724289.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Out[81]:
ProductName ProductPrice Discount Reviews Rate TotalAmount
0 TRebel Blaze 3999 43% 6 5.0 23994
1 TRebel Rockerz 333 1599 60% 6 5.0 9594
2 TRebel Airdopes 381 2199 56% 12 4.8 26388
3 TRebel Airdopes 402 1999 67% 9 4.8 17991
4 TRebel BassHeads 100 399 60% 17 4.9 6783
5 TRebel Xtend‌ 3499 56% 3 5.0 10497
6 TRebel Rockerz 450 1499 62% 1 5.0 1499
7 TRebel BassHeads 103 499 61% 11 5.0 5489
8 TRebel Airdopes 141 1499 67% 63 5.0 94437
9 TRebel BassHeads 102 399 69% 7 5.0 2793
10 TRebel Airdopes 171 1799 70% 3 5.0 5397
11 TRebel Matrix 4999 58% 1 5.0 4999
12 TRebel Rockerz 330 Pro 1799 40% 3 5.0 5397
13 Trebel Rockerz 255 Pro+ 1599 60% 7 4.7 11193
14 TRebel Airdopes 181 1599 47% 4 4.8 6396
15 TRebel Watch Straps 399 56% 1 5.0 399
16 TRebel Rockerz 235 V2 1199 60% 20 4.8 23980
17 TRebel Airdopes 131 1299 57% 56 4.7 72744
18 TRebel BassHeads 152 499 61% 11 5.0 5489
19 TRebel Airdopes 441 Pro 2999 50% 5 4.8 14995
20 TRebel Rockerz 255 Pro 1499 57% 4 5.0 5996
In [82]:
# add all df with product name and prices and then plot
prices_1 = bluetooth_speakers[['ProductName','ProductPrice']]
prices_2 = limited_edition[['ProductName','ProductPrice']]
prices_3 = misfit[['ProductName','ProductPrice']]
prices_4 = mobile_accessories[['ProductName','ProductPrice']]
prices_5 = smart_watches[['ProductName','ProductPrice']]
prices_6 = trebel[['ProductName','ProductPrice']]
prices_7 = wired_headphones[['ProductName','ProductPrice']]
prices_8 = wireless_earbuds[['ProductName','ProductPrice']]
prices_9 = wireless_headphones[['ProductName','ProductPrice']]
In [83]:
prices = pd.concat([prices_1,prices_2,prices_3,prices_4,prices_5,prices_6,prices_7,prices_8,prices_9])
prices
Out[83]:
ProductName ProductPrice
0 Stone 1000v2 3999
1 Stone Grenade 1499
2 Stone 190 1299
3 Stone 352 1699
4 Stone 650 1999
... ... ...
35 Rockerz 450R 1499
36 Rockerz 425 1499
37 Rockerz 450 Batman DC Edition 1299
38 Rockerz 255 ARC 1299
39 Trebel Rockerz 255 Pro+ 1599

196 rows × 2 columns

In [84]:
pe.scatter(data_frame = prices,
            x = 'ProductPrice',
          y = 'ProductName',
          width = 1000,
          height = 3000,
          title = ' Products Price Range')

Most of the BoAt products range between ₹ 199 to ₹ 3000.¶

Only 1 Product that is T-Rebel Matrix range in highest price category that is ₹ 5000.¶



T-Rebel Matrix

T-Rebel Matrix

Conclusion

  1. Imported all the necessary Libraries like Pandas, Numpy, Seaborn, Matplotlib, Plotly.

  2. Imported all 9 csv files into specific dataframe.

  3. Cleaned all the messy data files in 7 stages, that includes

    Stage 1 :- Dropping duplicates on basis of ProductName.
    Stage 2 :- Triming spaces in ProductName using strip.
    Stage 3 :- Cleaning ProductPrice column and making data type integer.
    Stage 4 :- Cleaning NumberofReviews and making data type as integer.
    Stage 5 :- Cleaning Discount Column.
    Stage 6 :- Cleaning Rate and making data type as float.
    Stage 7 :- Dropping Review and Summary columns.

  4. Performed Exploratory Data Analysis [EDA] and asked 10 questions.
    Q1. Retrieved names of Top 20 Product with Most Reviews and Discount.
    Q2. Retrieved Top 10 Products with Highest Rating and Highest Reviews.
    Q3. Retrieved Product with Most Number of Reviews.
    Q4. Retrieved Product with Less Ratings.
    Q5. Estimated Sales generated Only Through Reviewed Customers.
    Q6. Estimated Amount Generated Only through Reviewed Customers.
    Q7. Overall Revenue Generated From different Categories.
    Q8. Overall Sales Generated from different Categories.
    Q9. Highest and Lowest Discounted Products
    Q10. Prices Range Scatterplot

Thank You!